교육데이터 활용•지원 서비스

로그인

데이터셋 상세

미국

Comparison of Unsupervised Anomaly Detection Methods

Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.

데이터 정보

데이터 포털
미국
META URL
https://catalog.data.gov/dataset/comparison-of-unsupervised-anomaly-detection-methods
라이선스
notspecified
비용
제공기관
National Aeronautics and Space Administration
관리부서
데이터
- 랜딩 페이지
- Martin-JANNAF.pdf

연관 데이터

Comparative Analysis of Data-Driven Anomaly Detection Methods

공공데이터포털

This paper provides a review of three different advanced machine learning algorithms for anomaly detection in continuous data streams from a ground-test firing of a subscale Solid Rocket Motor (SRM). This study compares Orca, one-class support vector machines, and the Inductive Monitoring System (IMS) for anomaly detection on the data streams. We measure the performance of the algorithm with respect to the detection horizon for situations where fault information is available. These algorithms have been also studied by the present authors (and other co-authors) as applied to liquid propulsion systems. The trade space will be explored between these algorithms for both types of propulsion systems.

Anomaly Detection for Complex Systems

공공데이터포털

In performance maintenance in large, complex systems, sensor information from sub-components tends to be readily available, and can be used to make predictions about the system's health and diagnose possible anomalies. However, existing methods can only use predictions of individual component anomalies to guess at systemic problems, not accurately estimate the magnitude of the problem, nor prescribe good solutions. Since physical complex systems usually have well-defined semantics of operation, we here propose using anomaly detection techniques drawn from data mining in conjunction with an automated theorem prover working on a domain-specific knowledge base to perform systemic anomalydetection on complex systems. For clarity of presentation, the remaining content of this submission is presented compactly in Fig 1.

Discovering System Health Anomalies using Data Mining Techniques

공공데이터포털

We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.

Qualitative Event-based Diagnosis with Possible Conflicts Applied to Spacecraft Power Distribution Systems

공공데이터포털

Model-based diagnosis enables efficient and safe operation of engineered systems. In this paper, we describe two algorithms based on a qualitative event-based fault isolation framework augmented with model-based fault identification that are applied to spacecraft power distribution systems. Although based on a common framework, the fundamental difference between the two algorithms is that one uses a global model for residual generation, fault isolation, and fault identification; whereas the other uses a set of minimal submodels computed using Possible Conflicts. We describe the implementation of the two algorithms and compare their diagnosis results on a representative spacecraft power distribution system.

General Purpose Data-Driven System Monitoring for Space Operations

공공데이터포털

Modern space propulsion and exploration system designs are becoming increasingly sophisticated and complex. Determining the health state of these systems using traditional methods is becoming more difficult as the number of sensors and component interactions grows. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. The Inductive Monitoring System is a data-driven system health monitoring software tool that has been successfully applied to several aerospace applications. Inductive Monitoring System uses a data mining technique called clustering to analyze archived system data and characterize normal interactions between parameters. This characterization, or model, of nominal operation is stored in a knowledge base that can be used for real-time system monitoring or for analysis of archived events. Ongoing and developing Inductive Monitoring System space operations applications include International Space Station flight control, spacecraft vehicle system health management, launch vehicle ground operations, and fleet supportability. As a common thread of discussion this paper will employ the evolution of the Inductive Monitoring System data-driven technique as related to several Integrated Systems Health Management elements. Thematically, the projects listed will be used as case studies. The maturation of Inductive Monitoring System via projects where it has been deployed or is currently being integrated to aid in fault detection will be described. The paper will also explain how Inductive Monitoring System can be used to complement a suite of other Integrated System Health Management tools, providing initial fault detection support for diagnosis and recovery.

Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data

공공데이터포털

There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

Towards a Framework for Evaluating and Comparing Diagnosis Algorithms

공공데이터포털

Diagnostic inference involves the detection of anomalous system behavior and the identification of its cause, possibly down to a failed unit or to a parameter of a failed unit. Traditional approaches to solving this problem include expert/rule-based, model-based, and data-driven methods. Each approach (and various techniques within each approach) use different representations of the knowledge required to perform the diagnosis. The sensor data is expected to be combined with these internal representations to produce the diagnosis result. In spite of the availability of various diagnosis technologies, there have been only minimal efforts to develop a standardized software framework to run, evaluate, and compare different diagnosis technologies on the same system. This paper presents a framework that defines a standardized representation of the system knowledge, the sensor data, and the form of the diagnosis results – and provides a run-time architecture that can execute diagnosis algorithms, send sensor data to the algorithms at appropriate time steps from a variety of sources (including the actual physical system), and collect resulting diagnoses. We also define a set of metrics that can be used to evaluate and compare the performance of the algorithms, and provide software to calculate the metrics.

Anomaly Detection with Text Mining

공공데이터포털

Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The problem that we address in this paper is the discovery of recurring anomalies and relationships between problem reports that may indicate larger systemic problems. We will illustrate our techniques on data from discrepancy reports regarding software anomalies in the Space Shuttle. These free text reports are written by a number of different people, thus the emphasis and wording vary considerably. With Mehran Sahami from Stanford University, I'm putting together a book on text mining called "Text Mining: Theory and Applications" to be published by Taylor and Francis.

Unsupervised Anomaly Detection for Liquid-Fueled Rocket Prop...

공공데이터포털

Title: Unsupervised Anomaly Detection for Liquid-Fueled Rocket Propulsion Health Monitoring. Abstract: This article describes the results of applying four unsupervised anomaly detection algorithms to data from two rocket propulsion testbeds. The first testbed uses historical data from the Space Shuttle Main Engine. The second testbed uses data from an experimental rocket engine test stand located at NASA Stennis Space Center. The article describes nine anomalies detected by the four algorithms. The four algorithms use four different definitions of anomalousness. Orca uses a nearest-neighbor approach, defining a point to be an anomaly if its nearest neighbors in the data space are far away from it. The Inductive Monitoring System clusters the training data, and then uses the distance to the nearest cluster as its measure of anomalousness. GritBot learns rules from the training data, and then classifies points as anomalous if they violate these rules. One-class support vector machines map the data into a high-dimensional space in which most of the normal points are on one side of a hyperplane, and then classify points on the other side of the hyperplane as anomalous. Because of these different definitions of anomalousness, different algorithms detect different anomalies. We therefore conclude that it is useful to use multiple algorithms.

Anomaly Detection in Sequences

공공데이터포털

We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of he longest common subsequence (nLCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from a cluster. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithm provides a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. The final section of the paper demonstrates the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior

목록