The Data Science & Big Data Overview | Tools, Tech & Modern Roles in the Data-Driven Enterprise is an introductory level course that introduces the entire multi-disciplinary Data Science team to the many evolving and related terms, with focus on Big Data, Data Science, Predictive Analytics, Artificial Intelligence, Data Mining, Data Warehousing. The overview explores the current state of the art and science, the major components of a modern data science infrastructure, team roles and responsibilities, and level-setting realistic possible outcomes for your investment. This goal of this course is to provide students with a baseline understanding of core concepts that can serve as a platform of knowledge to follow up with more in-depth training and real-world practice.

starstarstarstarstar_outline

* Actual course outline may vary depending on offering center. Contact your sales representative for more information.

Learning Objectives

This course provides a high-level view of a variety of core, current data science related technologies, strategies, skillsets, initiatives and supporting tools in common business enterprise practices. This list covers a general range of topics current to the time of course distribution.
Students will explore:
Foundations: Grids & Virtualization; SOA, ESB / EMB, The Cloud
The Hadoop Ecosystem: HDFS; Resource Navigators, MapReduce, Spark, Distributions
Big Data, NOSQL, and ETL
ETL: Exchange, Transform, Load
Handling Data & a Survey of Useful tools
Enterprise Integration Patterns and Message Busses
Developing in Hadoop Ecosystem: R, Python, Java, Scala, Pig, and BPMN
Artificial Intelligence and Business Systems
Who’s on the Team? Evolving Roles and Functions in Data Science
Growing your Infrastructure

1
  • Foundations

  • Grids and Virtualization
    Service-Oriented Architecture
    Enterprise Service Bus
    Enterprise Message Bus
    The Cloud

2
  • The Hadoop Ecosystem

  • HDFS: Hadoop Distributed File System
    Resource Negotiators: YARN, Mesos, and Spark; ZooKeeper
    Hadoop Map/Reduce
    Spark
    Hadoop Ecosystem Distributions: Cloudera, Hortonworks, OpenSource

3
  • Big Data, NOSQL, and ETL

  • Big Data vs. RDBMS
    NOSQL: Not Only SQL
    Relational Databases: Oracle, MariaDB, DB/2, SQL Server, PostGreSQL
    Key/Value Databases: JBoss Infinispan, Terracotta, Dynamo, Voldemort
    Columnar Databases: Cassandra, HBase, BigTable
    Document Databases: MongoDB, CouchDB/CouchBase
    Graph Databases: Giraph, Neo4J, GraphX
    Apache Hive
    Common Data Formats
    Leveraging SQL and SQL variants

4
  • ETL: Exchange, Transform, Load

  • Data Ingestion, Transformation, and Loading
    Exporting Data
    Sqoop, Flume, Informatica, and other tools

5
  • Enterprise Integration Patterns and Message Busses

  • Enterprise Integration Patterns: Apache Camel and Spring Integration
    Enterprise Message Busses: Apache Kafka, ActiveMQ, and other tools

6
  • Developing in Hadoop Ecosystem

  • Languages: R, Python, Java, Scala, Pig, and BPMN
    Libraries and Frameworks
    Development, Testing, and Deployment

7
  • Artificial Intelligence and Business Systems

  • Artificial Intelligence: Myths, Legends, and Reality
    The Math
    Statistics
    Probability
    Clustering Algorithms, Mahout, MLLib, SciKit, and Madlib
    Business Rule Systems: Drools, JRules, Pegasus

8
  • The Modern Data Team

  • Agile Data Science
    NOSQL Data Architects and Administrators
    Developers
    Grid Administrators
    Business and Data Analysts
    Management
    Evolving your Team
    Growing your Infrastructure

Audience

This introductory-level / primer course is an overview for Business Analysts, Data Analysts, Data Architects, DBAs, Network (Grid) Administrators, Developers or anyone else in the data science realm who need to have a baseline understanding of some of the core areas of modern Data Science technologies, practices and available tools. Attendees should have prior exposure to Enterprise Information Technology, as well as familiarity with Relational Databases.

Language

English

Prerequisites

This introductory-level / primer course is an overview for Business Analysts, Data Analysts, Data Architects, DBAs, Network (Grid) Administrators, Developers or anyone else in the data science realm who need to have a baseline understanding of some of the core areas of modern Data Science technologies, practices and available tools. Attendees should have prior exposure to Enterprise Information Technology, as well as familiarity with Relational Databases

$795

Length: 1.0 day (8 hours)

Level:

Not Your Location? Change

Course Schedule:

To request a custom delivery, please chat with an expert.

Loading...