Duration of Training

4 Days

Prerequisites

  • To graduate from fields such as Engineering, Mathematics, Statistics, Informatics.
  • To have basic knowledge of Python.
  • To have basic Linux knowledge.
  • To have knowledge of data, data analysis, mathematics, statistics, computer science, database, database query.

Audience

Suitable for people who want:

  • To design highly scalable distributed systems that deal with big data expertise and using different open source tools,
  • To understand how algorithms work and create high performance algorithms,
  • To work on processes such as collecting, parsing, managing, analyzing and visualizing complex big data projects,
  • To decide the necessary hardware and software design needs and design processes according to these decisions.
  • Software developers, analysts and data scientists who need to apply data science and machine learning in Spark / Hadoop,
  • To collect, analyze and interpret extremely large amounts of data, using advanced analysis technologies,
  • To use various analysis and reporting tools by collecting and analyzing data, identifying patterns, trends and relationships in data sets, who want to work on big amounts of data.

Training Goals

  • Learning the basics of big data history, Hadoop fundamentals and basic technologies in Hadoop ecosystem,
  • Learning the basic information about the distributed file system (HDFS) that constitutes the core Hadoop and the features and usage of YARN, which provides resource management,
  • Planning the big data cluster setup, learning about big data cluster setup, configuration and management with Ambari,
  • Learning general information about usage scenarios and basic components for Kafka and Nifi, which are the basis of data transfer technologies,
  • Learning basic information about Flume and Sqoop used for data transfer to Hadoop environment
  • Learning basic information about Hive that enables running query scripts on files in the distributed file system,
  • Learning the basics of Streaming, SQL, DataFrame and GraphX ​​libraries with Spark, which are used to perform in-memory analysis and analytical studies on big data,
  • Learning the basics of Pig Latin script language for data analysis,
  • Learning basic information about Zookeeper as a service manager in Big Data Ecosystem and Oozie services which is workflow scheduler,
  • Learning basic information about NoSQL databases and their usage,
  • Learning basic information about project life cycle, data collection, data evaluation, data transformation and data analysis,
  • Learning the basic information about Artificial Intelligence and Machine Learning,
  • Learning the basics of using machine learning algorithms with Spark, which is used to perform in-memory analysis and analytical studies on big data,
  • Sample application studies
  • Examining case studies of Advanced Analytical Applications in Big Data
  • Examination of how big data technologies and artificial intelligence can be used in real world problems

Syllabus

  • Big Data History and Basics
  • Core Hadoop: HDFS and YARN
  • Big Data Cluster Management: Ambari
  • Data Integration: Kafka and Nifi
  • Data Integration: Flume and Sqoop
  • Data Analysis: Hive
  • Data Processing: Spark (Streaming, SQL, DataFrame, GraphX)
  • Data Analysis: Pig
  • Zookeeper and Oozie
  • Data Storage: HBase
  • Data Science Fundamentals
  • Artificial Intelligence and Machine Learning Fundamentals
  • Data Processing: Spark Machine Learning (ML)
  • Spark ML Lab Study
  • Advanced Analytical Applications in Big Data
  • How can big data technologies and artificial intelligence be used in real world problems?
  • Application Study