데이터셋 상세
미국
Highly Scalable Matching Pursuit Signal Decomposition Algorithm
In this research, we propose a variant of the classical Matching Pursuit Decomposition (MPD) algorithm with significantly improved scalability and computational performance. MPD is a powerful iterative algorithm that decomposes a signal into linear combinations of its dictionary elements or “atoms”. A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criteria is met. A sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. Our main contribution lies in improving the computational efficiency of the algorithm to allow faster decomposition while maintaining a similar level of accuracy. The Correlation Thresholding and Multiple Atom Extractions techniques were proposed to decrease the computational burden of the algorithm. Correlation thresholds prune insignificant atoms from the dictionary. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. The proposed algorithm, entitled MPD++, was demonstrated using real world data set.
데이터 정보
연관 데이터
Algorithms for Spectral Decomposition with Applications
공공데이터포털
The analysis of spectral signals for features that represent physical phenomenon is ubiquitous in the science and engineering communities. There are two main approaches that can be taken to extract relevant features from these high-dimensional data streams. The first set of approaches relies on extracting features using a physics-based paradigm where the underlying physical mechanism that generates the spectra is used to infer the most important features in the data stream. We focus on a complementary methodology that uses a data-driven technique that is informed by the underlying physics but also has the ability to adapt to unmodeled system attributes and dynamics. We discuss the following four algorithms: Spectral Decomposition Algorithm (SDA), Non-Negative Matrix Factorization (NMF), Independent Component Analysis (ICA) and Principal Components Analysis (PCA) and compare their performance on a spectral emulator which we use to generate artificial data with known statistical properties. This spectral emulator mimics the real-world phenomena arising from the plume of the space shuttle main engine and can be used to validate the results that arise from various spectral decomposition algorithms and is very useful for situations where real-world systems have very low probabilities of fault or failure. Our results indicate that methods like SDA and NMF provide a straightforward way of incorporating prior physical knowledge while NMF with a tuning mechanism can give superior performance on some tests. We demonstrate these algorithms to detect potential system-health issues on data from a spectral emulator with tunable health parameters.
sequenceMiner algorithm
공공데이터포털
Detecting and describing anomalies in large repositories of discrete symbol sequences. **sequenceMiner has been open-sourced! Download the file below to try it out.** sequenceMiner was developed to address the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. sequenceMiner works by performing unsupervised clustering (grouping) of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. sequenceMiner utilizes a new hybrid algorithm for computing the LCS that has been shown to outperform existing algorithms by a factor of five. sequenceMiner also includes new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. This provides analysts with a coherent description of the anomalies identified in the sequence, and why they differ from more “normal” sequences. sequenceMiner was developed with funding from the NASA Aviation Safety Program. In the commercial aviation domain, sequenceMiner can be used to discover atypical behavior in airline performance data that may have possible operational significance for safety analysts. But because the sequenceMiner approach is general and not restricted in any way to a domain, and these algorithms can be applied in other fields where anomaly detection and event mining would be useful.
Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach
공공데이터포털
In this paper we propose an innovative learning algorithm - a variation of One-class  Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class  SVMs while reducing both training time and test time by several factors.
Third International Diagnostic Competition
공공데이터포털
We present the third implementation of a framework created jointly by NASA Ames Research Center, Palo Alto Research Center, and Delft University of Technology to com- pare and evaluate diagnosis algorithms (DAs). This year‟s competition, DXC‟11, introduces a software track in addition to the industrial and synthetic tracks of previous competitions. A total of eleven DAs competed in the three tracks. The paper describes the systems, diag- nostic problems of the tracks, fault scenarios, evaluation metrics, participating DAs, results and analysis.
Fast Dynamic Programming for Elastic Registration of Curves
공공데이터포털
This is a software suite for computing optimal diffeomorphisms for elastic registration of curves. Algorithm adapt-DP is based on DP (dynamic programming) restricted to an adapting strip which is able to perform this computation in linear time. Description of Algorithm adapt-DP can be found in "Fast Dynamic Programming for Elastic Registration of Curves", Proceedings of the 2nd International Workshop on Differential Geometry in Computer Vision and Machine Learning (DIFF-CVML'16) in conjunction with Computer Vision Pattern Recognition Conference (CVPR) 2016, Las Vegas, Nevada, June 26-July 1, 2016. The zip file Fast_Dynamic_Programming.zip contains copies of implementation of Algorithm adapt-DP as Fortran files (a Matlab Fortran mex file and a Python compatible Fortran file) for execution with Matlab/Python, Matlab/Python test files for executing adapt-DP Matlab Fortran mex file and Python compatible Fortran file, respectively, example data files, usage instructions in README files, etc.
Asynchronous Mid-Value Select in Hybrid SAL
공공데이터포털
The following SAL model is an abstraction of a module that implements a fault-tolerant mid-value select on asynchronously produced inputs. This is part of a larger system that has both discrete and continuous dynamics, Our goal is to model the full system using Hybrid SAL and we have adapted the timed relational abstraction techique supported by Hybrid SAL to abstract asynchronous sampling of continous signals. This approach will be fully automated in future releases of Hybrid SAL. The following model (05/14/2012) shows the resulting abstraction, for the aysnchronous mid-value select module and includes proofs of various properties.
Fast discriminative latent Dirichlet allocation
공공데이터포털
This is the code for fast discriminative latent Dirichlet allocation, which is an algorithm for topic modeling and text classification. The related paper is at http://www-users.cs.umn.edu/~shan/icdm09_dm.pdf
MKAD (Open Sourced Code)
공공데이터포털
The Multiple Kernel Anomaly Detection (MKAD) algorithm is designed for anomaly detection over a set of files. It combines multiple kernels into a single optimization function using the One Class Support Vector Machine (OCSVM) framework. Any kernel function can be combined in the algorithm as long as it meets the Mercer conditions, however for the purposes of this code the data preformatting and kernel type is specific to the Flight Operations Quality Assurance (FOQA) data and has been integrated into the coding steps. For this domain, discrete binary switch sequences are used in the discrete kernel, and discretized continuous parameter features are used to form the continuous kernel. The OCSVM uses a training set of nominal examples (in this case flights) and evaluates test examples for anomaly detection to determine whether they are anomalous or not. After completing this analysis the algorithm reports the anomalous examples and determines whether there is a contribution from either or both continuous and discrete elements.
Anomaly Detection and Diagnosis Algorithms for Discrete Symbols
공공데이터포털
We present a set of novel algorithms which we call sequenceMiner that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining the system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of the longest common subsequence (nLCS) as a similarity measure, followed by detailed outlier analysis to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from the cluster centre. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. In the final section of the paper we demonstrate the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior.