Big Data Training | BULUT BİLİŞİM VE BÜYÜK VERİ ARAŞTIRMA LABORATUVARI

Duration of Training	2 Days
Prerequisites	To graduate from fields such as Engineering, Mathematics, Statistics, Informatics. To have basic knowledge of Python. To have basic Linux knowledge.
Audience	Suitable for people who want: To design highly scalable distributed systems that deal with big data expertise and using different open source tools, To understand how algorithms work and to create high performance algorithms, To work on processes such as collecting, parsing, managing, analyzing and visualizing complex big data projects, To decide the necessary hardware and software design needs and designing processes according to these decisions
Training Goals	Learning the basics of big data history, Hadoop fundamentals and basic technologies in Hadoop ecosystem, Learning basic information about project life cycle, data collection, data evaluation, data transformation and data analysis, Learning the basic information about the distributed file system (HDFS) that constitutes the core Hadoop and the features and usage of YARN, which provides resource management, Planning the big data cluster setup, learning about big data cluster setup, configuration and management with Ambari, Learning general information about usage scenarios and basic components for Kafka and Nifi, which are the basis of data transfer technologies, Learning basic information about Flume and Sqoop used for data transfer to Hadoop environment, Learning basic information about Hive that enables running query scripts on files in the distributed file system, Learning the basics of SQL, DataFrame, Machine Learning and GraphX libraries with Spark, which is used to perform in-memory analysis and analytical studies on big data, Learning basic information about Pig Latin script language for data analysis, Learning basic information about Zookeeper, which is a service manager in the big data ecosystem, and Oozie services, which is a workflow scheduler, Learning basic information about NoSQL databases and their usage.
Syllabus	Big Data History and Basics Data Science Fundamentals Core Hadoop: HDFS and YARN Big Data Cluster Management: Ambari Data Integration: Kafka and Nifi Data Integration: Flume and Sqoop Data Analysis: Hive Data Processing: Spark (Streaming, SQL, DataFrame, ML, GraphX) Data Analysis: Pig Zookeeper and Oozie Data Storage: HBase