Serverless Data Processing with Dataflow Course

Price
$2,700.00 USD

Duration
3 Days

 

Delivery Methods
Virtual Instructor Led
Private Group

Course Overview

How do you build data pipelines that scale without locking yourself into one platform? With Apache Beam and Google Cloud Dataflow, you can run serverless data processing at scale—without compromising flexibility or performance.

This 3-day course teaches data engineers and analysts how to use Apache Beam with Dataflow to build resilient, scalable, and portable pipelines for batch and streaming applications. You’ll learn to optimize performance, implement secure deployments, monitor your jobs, and apply best practices across the pipeline lifecycle—from development to CI/CD.

Whether you’re processing terabytes of batch data or building real-time pipelines, this course gives you the tools to simplify operations and build faster, more cost-effective solutions with Google Cloud Dataflow.

Course Objectives

  • Demonstrate how Apache Beam and Dataflow work together to fulfill your organization’s data processing needs.
  • Summarize the benefits of the Beam Portability Framework and enable it for your Dataflow pipelines.
  • Enable Shuffle and Streaming Engine, for batch and streaming pipelines respectively, for maximum performance.
  • Enable Flexible Resource Scheduling for more cost-efficient performance.
  • Select the right combination of IAM permissions for your Dataflow job.
  • Implement best practices for a secure data processing environment.
  • Select and tune the I/O of your choice for your Dataflow pipeline.
  • Use schemas to simplify your Beam code and improve the performance of your pipeline.
  • Develop a Beam pipeline using SQL and DataFrames.
  • Perform monitoring, troubleshooting, testing and CI/CD on Dataflow pipelines.

Who Should Attend?

This course is designed for data engineers, as well as data analysts and data scientists looking to develop hands-on data engineering skills. Ideal for those working with batch or streaming pipelines on Google Cloud.

  • Top-rated instructors: Our crew of subject matter experts have an average instructor rating of 4.8 out of 5 across thousands of reviews.
  • Authorized content: We maintain more than 35 Authorized Training Partnerships with the top players in tech, ensuring your course materials contain the most relevant and up-to date information.
  • Interactive classroom participation: Our virtual training includes live lectures, demonstrations and virtual labs that allow you to participate in discussions with your instructor and fellow classmates to get real-time feedback.
  • Post Class Resources: Review your class content, catch up on any material you may have missed or perfect your new skills with access to resources after your course is complete.
  • Private Group Training: Let our world-class instructors deliver exclusive training courses just for your employees. Our private group training is designed to promote your team’s shared growth and skill development.
  • Tailored Training Solutions: Our subject matter experts can customize the class to specifically address the unique goals of your team.

What is Serverless Data Processing with Dataflow training?

This 3-day course teaches how to design, build, and manage scalable data pipelines using Apache Beam and Google Cloud Dataflow. You'll learn development, security, monitoring, and CI/CD techniques.

Is this course part of a certification path or credential?

While not tied to a specific exam, this course is part of the Google Cloud data engineering curriculum and supports preparation for the Google Cloud Certified – Professional Data Engineer certification.

Will this course help me develop and deploy batch or streaming pipelines?

Yes. You'll work with Apache Beam and Dataflow to build, optimize, and deploy both batch and real-time pipelines, using best practices and production-ready techniques.

Will this course help me implement CI/CD and monitoring for Dataflow jobs?

Absolutely. You’ll learn how to monitor performance, set up alerts, and integrate testing and deployment workflows using Cloud Monitoring, Beam notebooks, and Flex Templates.

Is this training worth it?

Yes—serverless pipelines reduce operational overhead and accelerate time-to-insight. This course teaches scalable, portable data engineering practices in high demand across industries.

Course Prerequisites

  • Building Batch Data Pipelines
  • Building Resilient Streaming Analytics Systems

Agenda

Introduction

  • Course objectives and overview
  • Apache Beam and Dataflow integration

Beam Portability and Compute Options

  • Beam Portability Framework and use cases
  • Custom containers and cross-language transforms
  • Shuffle, Streaming Engine, and Flexible Resource Scheduling

IAM, Quotas, and Security

  • Selecting IAM roles and managing quotas
  • Zonal strategies for data processing
  • Best practices for secure environments

Beam Concepts and Streaming Foundations

  • Apache Beam review: PCollections, PTransforms, DoFn lifecycle
  • Windows, watermarks, and triggers for streaming data
  • Handling late data and defining trigger types

Sources, Sinks, and Schemas

  • Writing and tuning I/O for performance
  • Creating custom sources and sinks with SDF
  • Using schemas to express structured data and improve performance

State, Timers, and Best Practices

  • When and how to use state and timer APIs
  • Choosing the right type of state for your pipeline
  • Development and design best practices

Developing with SQL, DataFrames, and Notebooks

  • Using Beam SQL and DataFrames to build pipelines
  • Prototyping in Beam notebooks with Beam magics
  • Launching jobs to Dataflow from notebooks

Monitoring, Logging, and Troubleshooting

  • Navigating the Dataflow Job Details UI
  • Setting alerts with Cloud Monitoring
  • Troubleshooting with diagnostics widgets and error reports
  • Structured debugging and common failure patterns

Performance, Testing, and CI/CD

  • Performance tuning and data shape considerations
  • Testing strategies and automation
  • Streamlining CI/CD workflows for Dataflow

Reliability and Flex Templates

  • Designing for reliability in production pipelines
  • Using Flex Templates to standardize and reuse code

Summary

  • Course recap and next steps
 

Upcoming Class Dates and Times

Sep 10, 11, 12
8:00 AM - 4:00 PM
ENROLL $2,700.00 USD
 



Do You Have Additional Questions? Please Contact Us Below.

contact us contact us 
Contact Us about Starting Your Business Training Strategy with New Horizons