Efficient Matlab Programs
공공데이터포털
Matlab has a reputation for running slowly. Here are some pointers on how to speed computations, to an often unexpected degree. Subjects currently covered: Matrix Coding Implicit Multithreading on a Multicore Machine Sparse Matrices Sub-Block Computation to Avoid Memory Overflow -------------------------------------------------------------------------------------------------------- Matrix Coding - 1 Matlab documentation notes that efficient computation depends on using the matrix facilities, and that mathematically identical algorithms can have very different runtimes, but they are a bit coy about just what these differences are. A simple but telling example: The following is the core of the GD-CLS algorithm of Berry et.al., copied from fig. 1 of Shahnaz et.al, 2006, "Document clustering using nonnegative matrix factorization': for jj = 1:maxiter A = W'*W + lambda*eye(k); for ii = 1:n b = W'*V(:,ii); H(:,ii) = A \ b; end H = H .* (H>0); W = W .* (V*H') ./ (W*(H*H') + 1e-9); end Replacing the columwise update of H with a matrix update gives: for jj = 1:maxiter A = W'*W + lambda*eye(k); B = W'*V; H = A \ B; H = H .* (H>0); W = W .* (V*H') ./ (W*(H*H') + 1e-9); end These were tested on an 8049 x 8660 sparse matrix bag of words V (.0083 non-zeros), with W of size 8049 x 50, H 50 x 8660, maxiter = 50, lambda = 0.1, and identical initial W. They were run consecutivly, multithreaded on an 8-processor Sun server, starting at ~7:30PM. Tic-toc timing was recorded. Runtimes were respectivly 6586.2 and 70.5 seconds, a 93:1 difference. The maximum absolute pairwise difference between W matrix values was 6.6e-14. Similar speedups have been consistantly observed in other cases. In one algorithm, combining matrix operations with efficient use of the sparse matrix facilities gave a 3600:1 speedup. For speed alone, C-style iterative programming should be avoided wherever possible. In addition, when a couple lines of matrix code can substitute for an entire C-style function, program clarity is much improved. ---------------------------------------------------------------------------------------------------------------------- Matrix Coding - 2 Applied to integration, the speed gains are not so great, largely due to the time taken to set up the and deal with the boundaries. The anyomous function setup time is neglegable. I demonstrate on a simple uniform step linearly interpolated 1-D integration of cos() from 0 to pi, which should yield zero: tic; step = .00001; fun = @cos; start = 0; endit = pi; enda = floor((endit - start)/step)*step + start; delta = (endit - enda)/step; intF = fun(start)/2; intF = intF + fun(endit)*delta/2; intF = intF + fun(enda)*(delta+1)/2; for ii = start+step:step:enda-step intF = intF + fun(ii); end intF = intF*step toc; intF = -2.910164109692914e-14 Elapsed time is 4.091038 seconds. Replacing the inner summation loop with the matrix equivalent speeds things up a bit: tic; step = .00001; fun = @cos; start = 0; endit = pi; enda = floor((endit - start)/step)*step + start; delta = (endit - enda)/step; intF = fun(start)/2; intF = intF + fun(endit)*delta/2; intF = intF + fun(enda)*(delta+1)/2; intF = intF + sum(fun(start+step:step:enda-step)); intF = intF*step toc; intF = -2.868419946011613e-14 Elapsed time is 0.141564 seconds. The core computation take
Extreme Robotics Sp. z o.o. - Automatic Data Explorer
공공데이터포털
,Industrial research: Task No. 1. Research on Data Refinement and Feature Engineering algorithms,Stage tasks: Task 1: Development of algorithms for statistical analysis of attribute values for data purification. The aim of the task was to develop an algorithm that is able to identify the type of attribute (scalar, discrete) and depending on the type (text, number, date, text label, etc.) and deduce which values can be considered correct and which are incorrect and cause noise dataset, which in turn affects the quality of the ML model.,Task 2: Development of algorithms for statistical analysis of data attributes in terms of optimal coding of learning vectors. The aim of the task was to develop an algorithm that is able to propose optimal coding of the learning vector to be used in the ML process and perform the appropriate conversion, depending on the type (text, number, date, text label, etc.) for each type of attribute (scalar, discrete). e.g. converting text to word instance matrix format. It was necessary to predict several possible conversion scenarios that are most often used in practice, resulting from the heuristic knowledge of experts.,Task 3: Developing a prototype of an automatic data cleaning and coding environment and testing the solution on samples of production data.,Industrial Research: Task No. 2. Research on the meta-learning algorithm,Task 1: Review of existing meta-learning concepts and selection of algorithms for further development The aim of the task was to analyze the state of knowledge on meta-learning in terms of the possibility of using existing research results in the project - a task carried out in the form of subcontracting by a scientific unit.,Task 2: Review and development of the most commonly used ML algorithms in terms of their susceptibility to hyperparameter meta-learning and practical usefulness of the obtained models. The aim of the task was to develop a pool of basic algorithms that will be used as production algorithms, i.e. performing the right predictions. The hyperparameters of these algorithms have been meta-learning. It was therefore necessary to develop a model of interaction of the main algorithm with individual production algorithms. – task carried out in the form of subcontracting by a scientific unit.,Task 3: Development of a meta-learning algorithm for selected types of ML models The aim of the task was to develop the main algorithm implementing the function of optimizing hyperparameters of production models. It should be noted that the hyperparameters have a different structure depending on the specific production model, so the de facto appropriate solution was to use a different optimization algorithm for each model separately.,Task 4: Developing a prototype of the algorithm and testing the operation of the obtained production data models.,Experimental development work: Task No. 3. Research on the prototype of the architecture of the platform implementation environment,Task 1: Developing the architecture of the data acquisition and storage module. The aim of the task was to develop an architecture for a scalable ETL (Extract Transform Load) solution for efficient implementation of the source data acquisition process (Data Ingest). An attempt was made to consider appropriate parsing algorithms and standardization of encoding data of various types (e.g. dates, numbers) in terms of effective further processing.,Task 2: Development of a module for configuring and executing data processing pipelines in a distributed architecture. Due to the high complexity of the implemented algorithms, it was necessary to develop an architecture that would allow pipeline processing of subsequent data processing steps on various machines with the possibility of using a distributed architecture in a cloud and/or virtual environment. The use of existing concepts of distributed architectures, such as Map Reduce, was considered here.,Task 3: Development of a user interface enabling intuitive control
Teamsoft Sp.z o.o. - Badanie nad sztuczną inteligencją w zakresie uczenia maszynowego systemu wspomagania decyzji w obszarze gospodarki energetycznej
공공데이터포털
,Nazwa: Badanie nad sztuczną inteligencją w zakresie uczenia maszynowego systemu wspomagania decyzji w obszarze gospodarki energetycznej, ciepłowniczej i gazowej.,Cel badań: Badania w kierunku opracowania technologii informatycznej, silnika wiedzy do systemu wspomagania decyzji w obszarze gospodarki energetycznej, ciepłowniczej i gazowej z wykorzystaniem technologii machine learning.,Zakres badań: zakres przeprowadzonych badań przemysłowych obejmował:,Opracowanie plików wykonywalnych realizujących opracowane algorytmu.,