Comparison of Unsupervised Anomaly Detection Methods
공공데이터포털
Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.
Anomaly Detection for Complex Systems
공공데이터포털
In performance maintenance in large, complex systems, sensor information from sub-components tends to be readily available, and can be used to make predictions about the system's health and diagnose possible anomalies. However, existing methods can only use predictions of individual component anomalies to guess at systemic problems, not accurately estimate the magnitude of the problem, nor prescribe good solutions. Since physical complex systems usually have well-defined semantics of operation, we here propose using anomaly detection techniques drawn from data mining in conjunction with an automated theorem prover working on a domain-specific knowledge base to perform systemic anomalydetection on complex systems. For clarity of presentation, the remaining content of this submission is presented compactly in Fig 1.
Anomaly Detection in Sequences
공공데이터포털
We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of he longest common subsequence (nLCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from a cluster. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithm provides a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. The final section of the paper demonstrates the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior
Anomaly Detection and Diagnosis Algorithms for Discrete Symbols
공공데이터포털
We present a set of novel algorithms which we call sequenceMiner that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining the system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of the longest common subsequence (nLCS) as a similarity measure, followed by detailed outlier analysis to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from the cluster centre. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. In the final section of the paper we demonstrate the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior.
Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data
공공데이터포털
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
Qualitative Event-based Diagnosis with Possible Conflicts Applied to Spacecraft Power Distribution Systems
공공데이터포털
Model-based diagnosis enables efficient and safe operation of engineered systems. In this paper, we describe two algorithms based on a qualitative event-based fault isolation framework augmented with model-based fault identification that are applied to spacecraft power distribution systems. Although based on a common framework, the fundamental difference between the two algorithms is that one uses a global model for residual generation, fault isolation, and fault identification; whereas the other uses a set of minimal submodels computed using Possible Conflicts. We describe the implementation of the two algorithms and compare their diagnosis results on a representative spacecraft power distribution system.
Unsupervised Anomaly Detection for Liquid-Fueled Rocket Prop...
공공데이터포털
Title: Unsupervised Anomaly Detection for Liquid-Fueled Rocket Propulsion Health Monitoring. Abstract: This article describes the results of applying four unsupervised anomaly detection algorithms to data from two rocket propulsion testbeds. The first testbed uses historical data from the Space Shuttle Main Engine. The second testbed uses data from an experimental rocket engine test stand located at NASA Stennis Space Center. The article describes nine anomalies detected by the four algorithms. The four algorithms use four different definitions of anomalousness. Orca uses a nearest-neighbor approach, defining a point to be an anomaly if its nearest neighbors in the data space are far away from it. The Inductive Monitoring System clusters the training data, and then uses the distance to the nearest cluster as its measure of anomalousness. GritBot learns rules from the training data, and then classifies points as anomalous if they violate these rules. One-class support vector machines map the data into a high-dimensional space in which most of the normal points are on one side of a hyperplane, and then classify points on the other side of the hyperplane as anomalous. Because of these different definitions of anomalousness, different algorithms detect different anomalies. We therefore conclude that it is useful to use multiple algorithms.
Predicting Engine Parameters using the Optical Spectrum
공공데이터포털
The Optical Plume Anomaly Detection (OPAD) system is under development to predict engine anomalies and engine parameters of the Space Shuttle's Main Engine (SSME). The anomaly detection is based on abnormal metal concentrations in the optical spectrum of the rocket plume. Such abnormalities could be indicative of engine corrosion or other malfunctions. Here, we focus on the second task of the OPAD system, namely the prediction of engine parameters such as rated power level (RPL) and mixture ratio (MR). Because of the high dimensionality of the spectrum, we developed a linear algorithm to resolve the optical spectrum of the exhaust plume into a number of separate components, each with a different physical interpretation. These components are used to predict the metal concentrations and engine parameters for online support of ground-level testing of the SSME. Currently, these predictions are labor intensive and cannot be done online. We predict RPL using neural networks and give preliminary results.
Anomaly Detection in a Fleet of Systems
공공데이터포털
A fleet is a group of systems (e.g., cars, aircraft) that are designed and manufactured the same way and are intended to be used the same way. For example, a fleet of delivery trucks may consist of one hundred instances of a particular model of truck, each of which is intended for the same type of service—almost the same amount of time and distance driven every day, approximately the same total weight carried, etc. For this reason, one may imagine that data mining for fleet monitoring may merely involve collecting operating data from the multiple systems in the fleet and developing some sort of model, such as a model of normal operation that can be used for anomaly detection. However, one then may realize that each member of the fleet will be unique in some ways—there will be minor variations in manufacturing, quality of parts, and usage. For this reason, the typical machine learning and statis- tics algorithm’s assumption that all the data are independent and identically distributed is not correct. One may realize that data from each system in the fleet must be treated as unique so that one can notice significant changes in the operation of that system.