This training course is the best preparation for the challenges faced by Hadoop developers. Participants will learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools.


* Actual course outline may vary depending on offering center. Contact your sales representative for more information.

Learning Objectives

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as: How data is distributed, stored, and processed in a Hadoop cluster How to use Sqoop and Flume to ingest data How to process distributed data with Apache Spark How to model structured data as tables in Impala and Hive How to choose the best data storage format for different data usage patterns Best practices for data storage

  • Course Outline

  • Introduction
    Introduction to Hadoop and the Hadoop Ecosystem
    Hadoop Architecture and HDFS
    Importing Relational Data with Apache Sqoop
    Introduction to Impala and Hive
    Modeling and Managing Data with Impala and Hive
    Data Formats
    Data Partitioning
    Capturing Data with Apache Flume
    Spark Basics
    Working with RDDs in Spark
    Writing and Deploying Spark Applications
    Parallel Programming with Spark
    Spark Caching and Persistence
    Common Patterns in Spark Data Processing
    Spark SQL and DataFrames


Hadoop Developers will benefit from this course.




This course is designed for developers and engineers who have programming experience. Apache Spark examples and hands-on exercises are presented in Scala and Python, so the ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful. Prior knowledge of Hadoop is not required.


Length: 4.0 days (32 hours)


Not Your Location? Change

Course Schedule:

To request a custom delivery, please chat with an expert.