Duration of Training | 4 Days |
Prerequisites | - To graduate from fields such as Engineering, Mathematics, Statistics, Informatics.
- To have basic knowledge of Python.
- To have basic Linux knowledge.
- To have knowledge of data, data analysis, mathematics, statistics, computer science, database, database query.
|
Audience | Suitable for people who want: - To design highly scalable distributed systems that deal with big data expertise and using different open source tools,
- To understand how algorithms work and create high performance algorithms,
- To work on processes such as collecting, parsing, managing, analyzing and visualizing complex big data projects,
- To decide the necessary hardware and software design needs and design processes according to these decisions.
- Software developers, analysts and data scientists who need to apply data science and machine learning in Spark / Hadoop,
- To collect, analyze and interpret extremely large amounts of data, using advanced analysis technologies,
- To use various analysis and reporting tools by collecting and analyzing data, identifying patterns, trends and relationships in data sets, who want to work on big amounts of data.
|
Training Goals | - Learning the basics of big data history, Hadoop fundamentals and basic technologies in Hadoop ecosystem,
- Learning the basic information about the distributed file system (HDFS) that constitutes the core Hadoop and the features and usage of YARN, which provides resource management,
- Planning the big data cluster setup, learning about big data cluster setup, configuration and management with Ambari,
- Learning general information about usage scenarios and basic components for Kafka and Nifi, which are the basis of data transfer technologies,
- Learning basic information about Flume and Sqoop used for data transfer to Hadoop environment
- Learning basic information about Hive that enables running query scripts on files in the distributed file system,
- Learning the basics of Streaming, SQL, DataFrame and GraphX libraries with Spark, which are used to perform in-memory analysis and analytical studies on big data,
- Learning the basics of Pig Latin script language for data analysis,
- Learning basic information about Zookeeper as a service manager in Big Data Ecosystem and Oozie services which is workflow scheduler,
- Learning basic information about NoSQL databases and their usage,
- Learning basic information about project life cycle, data collection, data evaluation, data transformation and data analysis,
- Learning the basic information about Artificial Intelligence and Machine Learning,
- Learning the basics of using machine learning algorithms with Spark, which is used to perform in-memory analysis and analytical studies on big data,
- Sample application studies
- Examining case studies of Advanced Analytical Applications in Big Data
- Examination of how big data technologies and artificial intelligence can be used in real world problems
|
Syllabus | - Big Data History and Basics
- Core Hadoop: HDFS and YARN
- Big Data Cluster Management: Ambari
- Data Integration: Kafka and Nifi
- Data Integration: Flume and Sqoop
- Data Analysis: Hive
- Data Processing: Spark (Streaming, SQL, DataFrame, GraphX)
- Data Analysis: Pig
- Zookeeper and Oozie
- Data Storage: HBase
- Data Science Fundamentals
- Artificial Intelligence and Machine Learning Fundamentals
- Data Processing: Spark Machine Learning (ML)
- Spark ML Lab Study
- Advanced Analytical Applications in Big Data
- How can big data technologies and artificial intelligence be used in real world problems?
- Application Study
|