Cloudera University’s four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data.

starstarstarstarstar_outline

* Actual course outline may vary depending on offering center. Contact your sales representative for more information.

Learning Objectives

Skills gained in this training include:
The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysis
The fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop
How Pig, Hive, and Impala improve productivity for typical analysis tasks
Joining diverse datasets to gain valuable business insight
Performing real-time, complex queries on datasets

1
  • Hadoop Fundamentals

  • The Motivation for Hadoop
    Hadoop Overview
    Data Storage- HDFS
    Distributed Data Processing- YARN, MapReduce, and Spark
    Data Processing and Analysis- Pig, Hive, and Impala
    Data Integration- Sqoop
    Other Hadoop Data Tools
    Exercise Scenarios Explanation

2
  • Introduction to Pig

  • What Is Pig?
    Pig's Features
    Pig Use Cases
    Interacting with Pig

3
  • Basic Data Analysis with Pig

  • Pig Latin Syntax
    Loading Data
    Simple Data Types
    Field Definitions
    Data Output
    Viewing the Schema
    Filtering and Sorting Data
    Commonly-Used Functions

4
  • Processing Complex Data with Pig

  • Storage Formats
    Complex/Nested Data Types
    Grouping
    Built-In Functions for Complex Data
    Iterating Grouped Data

5
  • Multi-Dataset Operations with Pig

  • Techniques for Combining Data Sets
    Joining Data Sets in Pig
    Set Operations
    Splitting Data Sets

6
  • Pig Troubleshoot & Optimization

  • Troubleshooting Pig
    Logging
    Using Hadoop's Web UI
    Data Sampling and Debugging
    Performance Overview
    Understanding the Execution Plan
    Tips for Improving the Performance of Your Pig Jobs

7
  • Introduction to Hive & Impala

  • What Is Hive?
    What Is Impala?
    Schema and Data Storage
    Comparing Hive to Traditional Databases
    Hive Use Cases

8
  • Querying with Hive & Impala

  • Databases and Tables
    Basic Hive and Impala Query Language Syntax
    Data Types
    Differences Between Hive and Impala Query Syntax
    Using Hue to Execute Queries
    Using the Impala Shell

9
  • Data Management

  • Data Storage
    Creating Databases and Tables
    Loading Data
    Altering Databases and Tables
    Simplifying Queries with Views
    Storing Query Results

10
  • Data Storage & Performance

  • Partitioning Tables
    Choosing a File Format
    Managing Metadata
    Controlling Access to Data

11
  • Relational Data Analysis with Hive & Impala

  • Joining Datasets
    Common Built-In Functions
    Aggregation and Windowing

12
  • Working with Impala

  • How Impala Executes Queries
    Extending Impala with User-Defined Functions
    Improving Impala Performance

13
  • Analyzing Text and Complex Data with Hive

  • Complex Values in Hive
    Using Regular Expressions in Hive
    Sentiment Analysis and N-Grams
    Conclusion

14
  • Hive Optimization

  • Understanding Query Performance
    Controlling Job Execution Plan
    Bucketing
    Indexing Data

15
  • Extending Hive

  • SerDes
    Data Transformation with Custom Scripts
    User-Defined Functions
    Parameterized Queries

16
  • Choosing the Best Tool for the Job

  • Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
    Which to Choose?

Audience

This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators.

Language

English

Prerequisites

Prerequisites for this course include: knowledge of SQL, knowledge of basic Linux command-line familiarity, & knowledge of at least one scripting language (e.g., Bash scripting, Perl, Python, Ruby)

$3,195

Length: 4.0 days (32 hours)

Level:

Not Your Location? Change

Course Schedule:

Schedule select
10
Oct
Tuesday
9:00 AM ET -
5:00 PM ET
Filling Fast
Available
Loading...