Duration of Training

3 Days

Prerequisites

  • To have knowledge of data, data analysis, mathematics, statistics, computer science, database, database query.
  • To have basic Linux knowledge.

Audience

Suitable for people who want:

  • Software developers, analysts and data scientists who need to apply data science and machine learning with Spark,
  • To collect, analyze and interpret extremely big amounts of data,
  • To use advanced analysis technologies,
  • To use various analysis and reporting tools by collecting and analyzing data, identifying patterns, trends and relationships in data sets, who want to work on big amounts of data.

Training Goals

  • Learning the basic information about Python Programming,
  • Learning the basics of big data history, Hadoop fundamentals and basic technologies in Hadoop ecosystem,
  • Learning the basic information about the distributed file system (HDFS) that constitutes the core Hadoop and the features and usage of YARN, which provides resource management,
  • Learning the basics of SQL, DataFrame, Machine Learning and GraphX libraries with Spark, which is used to perform in-memory analysis and analytical studies on big data,
  • Learning the basics of using machine learning algorithms with Spark, which is used to perform in-memory analysis and analytical studies on big data.

Syllabus

  • Introduction to Python
  • Big Data Basics
  • Core Hadoop: HDFS and YARN
  • Spark Architecture
  • Spark Low Level API (RDD)
  • Spark High Level API (DataFrame, Dataset, SQL)
  • DataFrame and Dataset Persistence
  • Spark Streaming
  • Spark Structured Streaming
  • Spark Distributed Processing
  • Writing, Configuring, and Running Spark Applications
  • Performance Tuning
  • Spark ML
  • Deep Learning with Spark