교육데이터 활용•지원 서비스

,Industrial research: Task No. 1. Research on Data Refinement and Feature Engineering algorithms,Stage tasks: Task 1: Development of algorithms for statistical analysis of attribute values for data purification. The aim of the task was to develop an algorithm that is able to identify the type of attribute (scalar, discrete) and depending on the type (text, number, date, text label, etc.) and deduce which values can be considered correct and which are incorrect and cause noise dataset, which in turn affects the quality of the ML model.,Task 2: Development of algorithms for statistical analysis of data attributes in terms of optimal coding of learning vectors. The aim of the task was to develop an algorithm that is able to propose optimal coding of the learning vector to be used in the ML process and perform the appropriate conversion, depending on the type (text, number, date, text label, etc.) for each type of attribute (scalar, discrete). e.g. converting text to word instance matrix format. It was necessary to predict several possible conversion scenarios that are most often used in practice, resulting from the heuristic knowledge of experts.,Task 3: Developing a prototype of an automatic data cleaning and coding environment and testing the solution on samples of production data.,Industrial Research: Task No. 2. Research on the meta-learning algorithm,Task 1: Review of existing meta-learning concepts and selection of algorithms for further development The aim of the task was to analyze the state of knowledge on meta-learning in terms of the possibility of using existing research results in the project - a task carried out in the form of subcontracting by a scientific unit.,Task 2: Review and development of the most commonly used ML algorithms in terms of their susceptibility to hyperparameter meta-learning and practical usefulness of the obtained models. The aim of the task was to develop a pool of basic algorithms that will be used as production algorithms, i.e. performing the right predictions. The hyperparameters of these algorithms have been meta-learning. It was therefore necessary to develop a model of interaction of the main algorithm with individual production algorithms. – task carried out in the form of subcontracting by a scientific unit.,Task 3: Development of a meta-learning algorithm for selected types of ML models The aim of the task was to develop the main algorithm implementing the function of optimizing hyperparameters of production models. It should be noted that the hyperparameters have a different structure depending on the specific production model, so the de facto appropriate solution was to use a different optimization algorithm for each model separately.,Task 4: Developing a prototype of the algorithm and testing the operation of the obtained production data models.,Experimental development work: Task No. 3. Research on the prototype of the architecture of the platform implementation environment,Task 1: Developing the architecture of the data acquisition and storage module. The aim of the task was to develop an architecture for a scalable ETL (Extract Transform Load) solution for efficient implementation of the source data acquisition process (Data Ingest). An attempt was made to consider appropriate parsing algorithms and standardization of encoding data of various types (e.g. dates, numbers) in terms of effective further processing.,Task 2: Development of a module for configuring and executing data processing pipelines in a distributed architecture. Due to the high complexity of the implemented algorithms, it was necessary to develop an architecture that would allow pipeline processing of subsequent data processing steps on various machines with the possibility of using a distributed architecture in a cloud and/or virtual environment. The use of existing concepts of distributed architectures, such as Map Reduce, was considered here.,Task 3: Development of a user interface enabling intuitive control