The Data Science & Big Data Overview | Tools, Tech & Modern Roles in the Data-Driven Enterprise is an introductory level course that introduces the entire multi-disciplinary Data Science team to the many evolving and related terms, with focus on Big Data, Data Science, Predictive Analytics, Artificial Intelligence, Data Mining, Data Warehousing. The overview explores the current state of the art and science, the major components of a modern data science infrastructure, team roles and responsibilities, and level-setting realistic possible outcomes for your investment. This goal of this course is to provide students with a baseline understanding of core concepts and technologies to a conversant level.

starstarstarstarstar_half

* Actual course outline may vary depending on offering center. Contact your sales representative for more information.

Learning Objectives

Throughout the session you'll:
Foundations: Grids & Virtualization; SOA, ESB / EMB, The Cloud
The Hadoop Ecosystem: HDFS; Resource Navigators, MapReduce, Spark, Distributions
Big Data, NOSQL, and ETL
ETL: Exchange, Transform, Load
Handling Data & a Survey of Useful tools
Enterprise Integration Patterns and Message Busses
Developing in Hadoop Ecosystem: R, Python, Java, Scala, Pig, and BPMN
Artificial Intelligence and Business Systems
Who’s on the Team? Evolving Roles and Functions in Data Science
Growing your Infrastructure

1
  • FOUNDATIONS

  • Grids and Virtualization

    Service-Oriented Architecture

    Enterprise Service Bus

    Enterprise Message Bus

    The Cloud


2
  • THE HADOOP ECOSYSTEM

  • HDFS: Hadoop Distributed File System

    Resource Negotiators: YARN, Mesos, and Spark; ZooKeeper

    Hadoop Map/Reduce

    Spark

    Hadoop Ecosystem Distributions: Cloudera, Hortonworks, OpenSource


3
  • BIG DATA, NOSQL, AND ETL

  • Big Data vs. RDBMS

    NOSQL: Not Only SQL

    Relational Databases: Oracle, MariaDB, DB/2, SQL Server, PostGreSQL

    Key/Value Databases: JBoss Infinispan, Terracotta, Dynamo, Voldemort

    Columnar Databases: Cassandra, HBase, BigTable

    Document Databases: MongoDB, CouchDB/CouchBase

    Graph Databases: Giraph, Neo4J, GraphX

    Apache Hive

    Common Data Formats

    Leveraging SQL and SQL variants


4
  • EXCHANGE, TRANSFORM, LOAD

  • Data Ingestion, Transformation, and Loading

    Exporting Data

    Sqoop, Flume, Informatica, and other tools


5
  • ENTERPRISE INTEGRATION PATTERNS AND MESSAGE BUSSES

  • Enterprise Integration Patterns: Apache Camel and Spring Integration

    Enterprise Message Busses: Apache Kafka, ActiveMQ, and other tools


6
  • AN OVERVIEW OF DEVELOPING IN HADOOP ECOSYSTEM

  • Languages: R, Python, Java, Scala, Pig, and BPMN

    Libraries and Frameworks

    Development, Testing, and Deployment


7
  • EXPLORING ARTIFICIAL INTELLIGENCE AND BUSINESS SYSTEMS

  • Artificial Intelligence: Myths, Legends, and Reality

    The Math

    Statistics

    Probability

    Clustering Algorithms, Mahout, MLLib, SciKit, and Madlib

    Business Rule Systems: Drools, JRules, Pegasus


Audience

This introductory-level / primer course is an overview intended for Business Analysts, Data Analysts, Data Architects, DBAs, Network (Grid) Administrators, Developers or anyone else in the data science realm who need to have a baseline understanding of some of the core areas of modern Data Science technologies, practices and available tools.

Language

English

Prerequisites

Attendees should have prior exposure to Enterprise Information Technology. As well as familiarity with Relational Databases.

$895

Length: 1.0 day (8 hours)

Level:

Not Your Location? Change

Course Schedule:

Schedule select
18
Oct
Wednesday
10:00 AM ET -
6:00 PM ET
Filling Fast
Available
Schedule select
15
Nov
Wednesday
10:00 AM ET -
6:00 PM ET
Filling Fast
Available
Loading...