Duration of Training

2 Days

Prerequisites

  • To graduate from fields such as Engineering, Mathematics, Statistics, Informatics.
  • To have basic knowledge of Python.
  • To have basic Linux knowledge.

Audience

Suitable for people who want:

  • To design highly scalable distributed systems that deal with big data expertise and using different open source tools,
  • To understand how algorithms work and to create high performance algorithms,
  • To work on processes such as collecting, parsing, managing, analyzing and visualizing complex big data projects,
  • To decide the necessary hardware and software design needs and designing processes according to these decisions

Training Goals

  • Learning the basics of big data history, Hadoop fundamentals and basic technologies in Hadoop ecosystem,
  • Learning basic information about project life cycle, data collection, data evaluation, data transformation and data analysis,
  • Learning the basic information about the distributed file system (HDFS) that constitutes the core Hadoop and the features and usage of YARN, which provides resource management,
  • Planning the big data cluster setup, learning about big data cluster setup, configuration and management with Ambari,
  • Learning general information about usage scenarios and basic components for Kafka and Nifi, which are the basis of data transfer technologies,
  • Learning basic information about Flume and Sqoop used for data transfer to Hadoop environment,
  • Learning basic information about Hive that enables running query scripts on files in the distributed file system,
  • Learning the basics of SQL, DataFrame, Machine Learning and GraphX ​​libraries with Spark, which is used to perform in-memory analysis and analytical studies on big data,
  • Learning basic information about Pig Latin script language for data analysis,
  • Learning basic information about Zookeeper, which is a service manager in the big data ecosystem, and Oozie services, which is a workflow scheduler,
  • Learning basic information about NoSQL databases and their usage.

Syllabus

  • Big Data History and Basics
  • Data Science Fundamentals
  • Core Hadoop: HDFS and YARN
  • Big Data Cluster Management: Ambari
  • Data Integration: Kafka and Nifi
  • Data Integration: Flume and Sqoop
  • Data Analysis: Hive
  • Data Processing: Spark (Streaming, SQL, DataFrame, ML, GraphX)
  • Data Analysis: Pig
  • Zookeeper and Oozie
  • Data Storage: HBase