Geared for experienced developers, the Spark Developer | Introduction to Spark for Big Data, Hadoop & Machine Learning course provides students with a comprehensive, hands-on exploration of enterprise-grade Spark programming, interacting with the significant components mentioned above to craft complete data science solutions. Students will leave this course armed with the skills they require to begin working with Spark in a practical, real world environment. This course is offered in support of the Python programming language but can also be offered for R or Java with advance notice and planning. Our team will work with you to coordinate the languages, tools and environment that will work best for your organization and needs. Please inquire for details.

starstarstarstarstar_outline

* Actual course outline may vary depending on offering center. Contact your sales representative for more information.

Learning Objectives

This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development focused on Spark and related tools. Working in a hands-on learning environment, students will explore Spark Ecosystem Spark Shell Spark Data structures (RDD, DataFrame, Dataset) Spark SQL Modern data formats and Spark Spark API Spark & Hadoop & Hive Spark ML overview GraphX Time-permitting: Spark Streaming Time-permitting: Optional Capstone Workshop (Time-Permitting)

1
  • Spark Introduction

  • Big data, Hadoop, Spark
    Spark concepts and architecture
    Spark components overview

2
  • The first look at Spark

  • Spark shell
    Spark web UIs
    Analyzing dataset - part 1

3
  • Spark Data structures

  • Partitions
    Distributed execution
    Operations- transformations and actions

4
  • Caching

  • Caching overview
    Various caching mechanisms available in Spark
    In memory file systems
    Caching use cases and best practices

5
  • DataFrames and Datasets

  • DataFrames Intro
    Loading structured data (JSON, CSV) using DataFrames
    Using schema
    Specifying schema for DataFrames

6
  • Spark SQL

  • Spark SQL concepts and overview
    Defining tables and importing datasets
    Querying data using SQL
    Handling various storage formats- JSON, Parquet, ORC

7
  • Spark and Hadoop

  • Hadoop Primer- HDFS, YARN
    Hadoop + Spark architecture
    Running Spark on Hadoop YARN
    Processing HDFS files using Spark
    Spark & Hive

8
  • Spark API

  • Overview of Spark APIs in Scala / Python
    The lifecycle of a Spark application
    Spark APIs
    Deploying Spark applications on YARN

9
  • Spark ML Overview

  • Machine Learning primer
    Machine Learning in Spark- MLib / ML
    Spark ML overview (newer Spark2 version)
    Algorithms overview- Clustering, Classifications, Recommendations

10
  • GraphX

  • GraphX library overview
    GraphX APIs
    Create a Graph and navigating it
    Shortest distance
    Pregel API

11
  • Time Permitting Topics Spark Streaming

  • Streaming concepts
    Evaluating Streaming platforms
    Spark streaming library overview
    Streaming operations
    Sliding window operations
    Structured Streaming
    Continuous streaming
    Spark & Kafka streaming

12
  • Workshop

  • Attendees will work on solving real-world data analysis problems using Spark

Audience

This foundation-level course is geared for intermediate skilled, experienced Developers and Architects (with basic Python experience) who seek to be proficient in advanced, modern development skills working with Apache Spark in an enterprise data environment.

Language

English

Prerequisites

There are no prerequisites for this course.

$2,195

Length: 3.0 days (24 hours)

Level:

Not Your Location? Change

Course Schedule:

Schedule select
14
Feb
Wednesday
10:00 AM ET -
6:00 PM ET
Filling Fast
Available
Schedule select
17
Apr
Wednesday
10:00 AM ET -
6:00 PM ET
Filling Fast
Available
Schedule select
12
Jun
Wednesday
10:00 AM ET -
6:00 PM ET
Filling Fast
Available
Loading...