Highly Scalable Matching Pursuit Signal Decomposition Algorithm
공공데이터포털
In this research, we propose a variant of the classical Matching Pursuit Decomposition (MPD) algorithm with significantly improved scalability and computational performance. MPD is a powerful iterative algorithm that decomposes a signal into linear combinations of its dictionary elements or “atoms”. A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criteria is met. A sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. Our main contribution lies in improving the computational efficiency of the algorithm to allow faster decomposition while maintaining a similar level of accuracy. The Correlation Thresholding and Multiple Atom Extractions techniques were proposed to decrease the computational burden of the algorithm. Correlation thresholds prune insignificant atoms from the dictionary. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. The proposed algorithm, entitled MPD++, was demonstrated using real world data set.
Virtual Sensors: Efficiently Estimating Missing Spectra
공공데이터포털
Various instruments are used to create images of the Earth and other objects in the universe in a diverse set of wavelength bands with the aim of understanding natural phenomena. Sometimes these instruments are built in a phased approach, with additional measurement capabilities added in later phases. In other cases, technology may mature to the point that the instrument offers new measurement capabilities that were not planned in the original design of the instrument. In still other cases, high resolution spectral measurements may be too costly to perform on a large sample and therefore lower resolution spectral instruments are used to take the majority of measurements. Many applied science questions that are relevant to the earth science remote sensing community require analysis of enormous amounts of data that were generated by instruments with disparate measurement capabilities. This paper addresses this problem using Virtual Sensors: a method that uses modelstrained on spectrally rich (high spectral resolution) data to "fill in" unmeasured spectral channels in spectrally poor (low spectral resolution) data. The models we use in this paper are Multi-Layer Perceptrons (MLPs), Support Vector Machines (SVMs) with Radial Basis Function (RBF) kernels and SVMs with Mixture Density Mercer Kernels (MDMK). We demonstrate this method by using models trained on the high spectral resolution Terra MODIS instrument to estimate what the equivalent of the MODIS 1.6 micron channel would be for the NOAA AVHRR/2 instrument. The scientific motivation for the simulation of the 1.6 micron channel is to improve the ability of the AVHRR/2 sensor to detect clouds over snow and ice.
Empirical Evaluation of Diagnostic Algorithm Performance Using a Generic Framework
공공데이터포털
A variety of rule-based, model-based and datadriven techniques have been proposed for detection and isolation of faults in physical systems. However, there have been few efforts to comparatively analyze the performance of these approaches on the same system under identical conditions. One reason for this was the lack of a standard framework to perform this comparison. In this paper we introduce a framework, called DXF, that provides a common language to represent the system description, sensor data and the fault diagnosis results; a run-time architecture to execute the diagnosis algorithms under identical conditions and collect the diagnosis results; and an evaluation component that can compute performance metrics from the diagnosis results to compare the algorithms. We have used DXF to perform an empirical evaluation of 13 diagnostic algorithms on a hardware testbed (ADAPT) at NASA Ames Research Center and on a set of synthetic circuits typically used as benchmarks in the model-based diagnosis community. Based on these empirical data we analyze the performance of each algorithm and suggest directions for future development.
Removing Spikes While Preserving Data and Noise using Wavelet Filter Banks
공공데이터포털
Many diagnostic datasets suffer from the adverse effects of spikes that are embedded in data and noise. For example, this is true for electrical power system data where the switches, relays, and inverters are major contributors to these effects. Spikes are mostly harmful to the analysis of data in that they throw off real-time detection of abnormal conditions, and classification of faults. Since noise and spikes are mixed together and embedded within the data, removal of the unwanted signals from the data is not always easy and may result in losing the integrity of the information carried by the data. Additionally, in some applications noise and spikes need to be filtered independently. The proposed algorithm is a multi-resolution filtering approach based on Haar wavelets that is capable of removing spikes while incurring insignificant damage to other data. In particular, noise in the data, which is a useful indicator that a sensor is healthy and not stuck, can be preserved using our approach. Presented here is the theoretical background with some examples from a realistic testbed.
Towards a Framework for Evaluating and Comparing Diagnosis Algorithms
공공데이터포털
Diagnostic inference involves the detection of anomalous system behavior and the identification of its cause, possibly down to a failed unit or to a parameter of a failed unit. Traditional approaches to solving this problem include expert/rule-based, model-based, and data-driven methods. Each approach (and various techniques within each approach) use different representations of the knowledge required to perform the diagnosis. The sensor data is expected to be combined with these internal representations to produce the diagnosis result. In spite of the availability of various diagnosis technologies, there have been only minimal efforts to develop a standardized software framework to run, evaluate, and compare different diagnosis technologies on the same system. This paper presents a framework that defines a standardized representation of the system knowledge, the sensor data, and the form of the diagnosis results – and provides a run-time architecture that can execute diagnosis algorithms, send sensor data to the algorithms at appropriate time steps from a variety of sources (including the actual physical system), and collect resulting diagnoses. We also define a set of metrics that can be used to evaluate and compare the performance of the algorithms, and provide software to calculate the metrics.
Anomaly Detection and Diagnosis Algorithms for Discrete Symbols
공공데이터포털
We present a set of novel algorithms which we call sequenceMiner that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining the system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of the longest common subsequence (nLCS) as a similarity measure, followed by detailed outlier analysis to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from the cluster centre. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. In the final section of the paper we demonstrate the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior.
Key Real-World Applications of Classifier Ensembles
공공데이터포털
Broad classes of statistical classification algorithms have beendeveloped and applied successfully to a wide range of real worlddomains. In general, ensuring that the particular classificationalgorithm matches the properties of the data is crucial inproviding results that meet the needs of the particular applicationdomain. One way in which the impact of this algorithm/applicationmatch can be alleviated is by using ensembles of classifiers, wherea variety of classifiers (either different types of classifiers ordifferent instantiations of the same classifier) are pooled before afinal classification decision is made. Intuitively, classifierensembles allow the different needs of a difficult problem to behandled by classifiers suited to those particular needs.Mathematically, classifier ensembles provide an extra degree offreedom in the classical bias/variance tradeoff, allowing solutionsthat would be difficult (if not impossible) to reach with only asingle classifier. Because of these advantages, classifier ensembles have been applied to many difficult real world problems. In this paper, we surveyselect applications of ensemble methods to problems that havehistorically been most representative of the difficulties inclassification. In particular, we survey applications of ensemblemethods to remote sensing, person recognition, one vs. allrecognition, and medicine.