Big Data Architecture Workshop (BDAW) is a learning event that addresses advanced big data architecture topics. BDAW brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system. Throughout the highly interactive workshop, students apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for students to learn techniques for architecting big data systems, not only from Cloudera’s experience but also from the experiences of fellow students.


* Actual course outline may vary depending on offering center. Contact your sales representative for more information.

Learning Objectives

More specifically, BDAW addresses advanced big data architecture topics, including, data formats, transformation, real-time, batch and machine learning processing, scalability, fault tolerance, security and privacy, minimizing the risk of an unsound architecture and technology selection.

  • Workshop Application Use Cases

  • Oz Metropolitan
    Architectural questions
    Team activity: Analyze Metroz Application Use Cases

  • Application Vertical Slice

  • Definition
    Minimizing risk of an unsound architecture
    Selecting a vertical slice
    Team activity: Identify an initial vertical slice for Metroz

  • Application Processing

  • Real time, near real time processing
    Batch processing
    Data access patterns
    Delivery and processing guarantees
    Machine Learning pipelines
    Team activity: identify delivery and processing patterns in Metroz, characterize response time requirements, identify Machine Learning pipelines

  • Application Data

  • Three V’s of Big Data
    Data Lifecycle
    Data Formats
    Transforming Data
    Team activity: Metroz Data Requirements

  • Scalable Applications

  • Scale up, scale out, scale to X
    Determining if an application will scale
    Poll: scalable airport terminal designs
    Hadoop and Spark Scalability
    Team activity: Scaling Metroz

  • Fault Tolerant Distributed Systems

  • Principles
    Hardware vs. Software redundancy
    Tolerating disasters
    Stateless functional fault tolerance
    Stateful fault tolerance
    Replication and group consistency
    Fault tolerance in Spark and Map Reduce
    Application tolerance for failures
    Team activity: Identify Metroz component failures and requirements

  • Security and Privacy

  • Principles
    Team activity: identify threats and security mechanisms in Metroz

  • Deployment

  • Cluster sizing and evolution
    On-premise vs. Cloud
    Edge computing
    Team activity: select deployment for Metroz

  • Technology Selection

  • HDFS
    Relational Database Management Systems
    Map Reduce
    Spark, including streaming, SparkSQL and SparkML
    Cloudera Search
    Data Sets and Formats
    Team activity: technologies relevant to Metroz

  • Software Architecture

  • Architecture artifacts
    One platform or multiple, lambda architecture
    Team activity: produce high level architecture, selected technologies, revisit vertical slice
    Vertical Slice demonstration


This course is for Senior Executives, CIOs and CTOs, Business Intelligence Executives, Marketing Executives, Data & Business Analytics Specialists, Innovation Specialists & Entrepreneurs, Academics, and other people interested in Big Data.




To gain the most from the workshop, students should have working knowledge of technologies such as HDFS, Spark, MapReduce, Hive/Impala, Data Formats and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities.


Length: 3.0 days (24 hours)


Not Your Location? Change

Course Schedule:

To request a custom delivery, please chat with an expert.