With TURKSTAT Big Data Analytics Project held in Republic of Turkey Ministry of Finance and Treasury, Turkish Statistical Institute (TURKSTAT), it is aimed to design a system that enables daily prices labeled with categories and subcategories information and jobs collected from websites to be stored, processed and analyzed in big data ecosystem as batch and streaming data. Thanks to the system, it will be possible to make position and skill classification from job advertisements, visualize the results, provide price tracking for plane-bus-package tour prices and make lag analysis.

Lambda architecture is used to transfer the data collected from the websites to the big data environment in the form of streaming data and to analyze the transferred data as batch and streaming data. The system architecture is developed using open source tools in the big data ecosystem and the Cloud Computing and Big Data Research Laboratory (B3LAB) Data Quality Tool (B3DataQuality) (Figure 1). During the system development phase, small scale demo setup is carried out in B3LAB Prototype Data Center located in TÜBİTAK BİLGEM's Gebze Campus.

 

Figure 1 - System architecture

 

Within the scope of the project, job posting position and skill classification models and lag analysis models will be created using machine learning and deep learning methods on batch data in the big data environment. The results to be obtained by processing the streaming data using machine learning models will be visualized in a business intelligence tool compatible with the big data environment