Duration of Training | 3 Days |
Prerequisites | - To have knowledge of data, data analysis, mathematics, statistics, computer science, database, database query.
- To have basic Linux knowledge.
|
Audience | Suitable for people who want: - Software developers, analysts and data scientists who need to apply data science and machine learning with Spark,
- To collect, analyze and interpret extremely big amounts of data,
- To use advanced analysis technologies,
- To use various analysis and reporting tools by collecting and analyzing data, identifying patterns, trends and relationships in data sets, who want to work on big amounts of data.
|
Training Goals | - Learning the basic information about Python Programming,
- Learning the basics of big data history, Hadoop fundamentals and basic technologies in Hadoop ecosystem,
- Learning the basic information about the distributed file system (HDFS) that constitutes the core Hadoop and the features and usage of YARN, which provides resource management,
- Learning the basics of SQL, DataFrame, Machine Learning and GraphX libraries with Spark, which is used to perform in-memory analysis and analytical studies on big data,
- Learning the basics of using machine learning algorithms with Spark, which is used to perform in-memory analysis and analytical studies on big data.
|
Syllabus | - Introduction to Python
- Big Data Basics
- Core Hadoop: HDFS and YARN
- Spark Architecture
- Spark Low Level API (RDD)
- Spark High Level API (DataFrame, Dataset, SQL)
- DataFrame and Dataset Persistence
- Spark Streaming
- Spark Structured Streaming
- Spark Distributed Processing
- Writing, Configuring, and Running Spark Applications
- Performance Tuning
- Spark ML
- Deep Learning with Spark
|