Cloudera University’s four-day course for designing and building Big Data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH).


* Actual course outline may vary depending on offering center. Contact your sales representative for more information.

Learning Objectives

Skills learned in this course include:
Creating a data set with Kite SDK
Developing custom Flume components for data ingestion
Managing a multi-stage workflow with Oozie
Analyzing data with Crunch
Writing user-defined functions for Hive and Impala
Writing user-defined functions for Hive and Impala
Indexing data with Cloudera Search

  • Introduction

  • Application Architecture

  • Scenario Explanation
    Understanding the Development Environment
    Identifying and Collecting Input Data
    Selecting Tools for Data Processing and Analysis
    Presenting Results to the Use

  • Defining & Using Datasets

  • Metadata Management
    What is Apache Avro?
    Avro Schemas
    Avro Schema Evolution
    Selecting a File Format
    Performance Considerations

  • Using the Kite SDK Data Module

  • What is the Kite SDK?
    Fundamental Data Module Concepts
    Creating New Data Sets Using the Kite SDK
    Loading, Accessing, and Deleting a Data Set

  • Importing Relational Data with Apache Sqoop

  • What is Apache Sqoop?
    Basic Imports
    Limiting Results
    Improving Sqoop’s Performance
    Sqoop 2

  • Capturing Data with Apache Flume

  • What is Apache Flume?
    Basic Flume Architecture
    Flume Sources
    Flume Sinks
    Flume Configuration
    Logging Application Events to Hadoop

  • Developing Custom Flume Components

  • Flume Data Flow and Common Extension Points
    Custom Flume Sources
    Developing a Flume Pollable Source
    Developing a Flume Event-Driven Source
    Custom Flume Interceptors
    Developing a Header-Modifying Flume Interceptor
    Developing a Filtering Flume Interceptor
    Writing Avro Objects with a Custom Flume Interceptor

  • Managing Workflows with Apache Oozie

  • The Need for Workflow Management
    What is Apache Oozie?
    Defining an Oozie Workflow
    Validation, Packaging, and Deployment
    Running and Tracking Workflows Using the CLI
    Hue UI for Oozie

  • Processing Data Pipelines with Apache Crunch

  • What is Apache Crunch?
    Understanding the Crunch Pipeline
    Comparing Crunch to Java MapReduce
    Working with Crunch Projects
    Reading and Writing Data in Crunch
    Data Collection API Functions
    Utility Classes in the Crunch API

  • Working with Tables in Apache Hive

  • What is Apache Hive?
    Accessing Hive
    Basic Query Syntax
    Creating and Populating Hive Tables
    How Hive Reads Data
    Using the RegexSerDe in Hive

  • Developing User-Defined Functions

  • What are User-Defined Functions?
    Implementing a User-Defined Function
    Deploying Custom Libraries in Hive
    Registering a User-Defined Function in Hive

  • Executing Interactive Queries with Impala

  • What is Impala?
    Comparing Hive to Impala
    Running Queries in Impala
    Support for User-Defined Functions
    Data and Metadata Management

  • Understanding Cloudera Search

  • What is Cloudera Search?
    Search Architecture
    Supported Document Formats

  • Indexing Data with Cloudera Search

  • Collection and Schema Management
    Indexing Data in Batch Mode
    Indexing Data in Near Real Time

  • Presenting Results to Users

  • Solr Query Syntax
    Building a Search UI with Hue
    Accessing Impala through JDBC
    Powering a Custom Web Application with Impala and Search


This course is best suited to developers, engineers, and architects who want to use use Hadoop and related tools to solve real-world problems.




Prerequisites for this course include: Cloudera Developer Training for Apache Hadoop, knowledge of Java and basic familiarity with Linux. & experience with SQL.


Length: 4.0 days (32 hours)


Not Your Location? Change

Course Schedule:

To request a custom delivery, please chat with an expert.