교육데이터 활용•지원 서비스

로그인

데이터셋 상세

미국

MKAD (Open Sourced Code)

The Multiple Kernel Anomaly Detection (MKAD) algorithm is designed for anomaly detection over a set of files. It combines multiple kernels into a single optimization function using the One Class Support Vector Machine (OCSVM) framework. Any kernel function can be combined in the algorithm as long as it meets the Mercer conditions, however for the purposes of this code the data preformatting and kernel type is specific to the Flight Operations Quality Assurance (FOQA) data and has been integrated into the coding steps. For this domain, discrete binary switch sequences are used in the discrete kernel, and discretized continuous parameter features are used to form the continuous kernel. The OCSVM uses a training set of nominal examples (in this case flights) and evaluates test examples for anomaly detection to determine whether they are anomalous or not. After completing this analysis the algorithm reports the anomalous examples and determines whether there is a contribution from either or both continuous and discrete elements.

데이터 정보

데이터 포털
미국
META URL
https://catalog.data.gov/dataset/mkad-open-sourced-code
라이선스
notspecified
비용
제공기관
National Aeronautics and Space Administration
관리부서
데이터

연관 데이터

ARC Code TI: Multiple Kernel Anomaly Detection (MKAD) Algorithm

공공데이터포털

The Multiple Kernel Anomaly Detection (MKAD) algorithm is designed for anomaly detection over a set of files.

Multiple Kernel Learning for Heterogeneous Anomaly Detection: Algorithm and Aviation Safety Case Study

공공데이터포털

The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. In this paper, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequences of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also discuss results on real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.

Ensemble Approach to Building Mercer Kernels

공공데이터포털

This paper presents a new methodology for automatic knowledge driven data mining based on the theory of Mercer Kernels, which are highly nonlinear symmetric positive definite mappings from the original image space to a very high, possibly infinite dimensional feature space. We describe a new method called Mixture Density Mercer Kernels (MDMK) to learn kernel function directly from data, rather than using pre-defined kernels. These data adaptive kernels can encode prior knowledge in the kernel using a Bayesian formulation, thus allowing for physical information to be encoded in the model. Specifically, we demonstrate the use of the algorithm in situations with extremely small samples of data. We compare the results with existing algorithms on data from the Sloan Digital Sky Survey (SDSS) and demonstrate the method’s superior performance against standard methods. The results show that the Mixture Density Mercer Kernel described here outperforms tree-based classification in distinguishing high-redshift galaxies from low redshift galaxies by approximately 16% on test data, bagged trees by approximately 7%, and bagged trees built on a much larger sample of data by approximately 2%. The code for these experiments has been generated with the AutoBayes tool, which automatically generates efficient and documented C/C++ code from abstract statistical model specifications. The core of the system is a schema library which contains templates for learning and knowledge discovery algorithms like different versions of EM, or numeric optimization methods like conjugate gradient methods. The template instantiation is supported by symbolic algebraic computations, which allows AutoBayes to find closed form solutions and, where possible, to integrate them into the code.

Anomaly Detection and Diagnosis Algorithms for Discrete Symbols

공공데이터포털

We present a set of novel algorithms which we call sequenceMiner that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining the system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of the longest common subsequence (nLCS) as a similarity measure, followed by detailed outlier analysis to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from the cluster centre. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. In the final section of the paper we demonstrate the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior.

Theoretically Optimal Distributed Anomaly Detection

공공데이터포털

A novel general framework for distributed anomaly detection with theoretical performance guarantees is proposed. Our algorithmic approach combines existing anomaly detection procedures with a novel method for computing global statistics using local sufficient statistics. Under a Gaussian assumption, our distributed algorithm is guaranteed to perform as well as its centralized counterpart, a condition we call Ôzero information lossÕ. We further report experimental results on synthetic as well as real-world data to demonstrate the viability of our approach.

Comparative Analysis of Data-Driven Anomaly Detection Methods

공공데이터포털

This paper provides a review of three different advanced machine learning algorithms for anomaly detection in continuous data streams from a ground-test firing of a subscale Solid Rocket Motor (SRM). This study compares Orca, one-class support vector machines, and the Inductive Monitoring System (IMS) for anomaly detection on the data streams. We measure the performance of the algorithm with respect to the detection horizon for situations where fault information is available. These algorithms have been also studied by the present authors (and other co-authors) as applied to liquid propulsion systems. The trade space will be explored between these algorithms for both types of propulsion systems.

Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data

공공데이터포털

There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

Comparison of Unsupervised Anomaly Detection Methods

공공데이터포털

Several different unsupervised anomaly detection algorithms have been applied to Space Shuttle Main Engine (SSME) data to serve the purpose of developing a comprehensive suite of Integrated Systems Health Management (ISHM) tools. As the theoretical bases for these methods vary considerably, it is reasonable to conjecture that the resulting anomalies detected by them may differ quite significantly as well. As such, it would be useful to apply a common metric with which to compare the results. However, for such a quantitative analysis to be statistically significant, a sufficient number of examples of both nominally categorized and anomalous data must be available. Due to the lack of sufficient examples of anomalous data, use of any statistics that rely upon a statistically significant sample of anomalous data is infeasible. Therefore, the main focus of this paper will be to compare actual examples of anomalies detected by the algorithms via the sensors in which they appear, as well the times at which they appear. We find that there is enough overlap in detection of the anomalies among all of the different algorithms tested in order for them to corroborate the severity of these anomalies. In certain cases, the severity of these anomalies is supported by their categorization as failures by experts, with realistic physical explanations. For those anomalies that can not be corroborated by at least one other method, this overlap says less about the severity of the anomaly, and more about their technical nuances, which will also be discussed.

Anomaly Detection for Complex Systems

공공데이터포털

In performance maintenance in large, complex systems, sensor information from sub-components tends to be readily available, and can be used to make predictions about the system's health and diagnose possible anomalies. However, existing methods can only use predictions of individual component anomalies to guess at systemic problems, not accurately estimate the magnitude of the problem, nor prescribe good solutions. Since physical complex systems usually have well-defined semantics of operation, we here propose using anomaly detection techniques drawn from data mining in conjunction with an automated theorem prover working on a domain-specific knowledge base to perform systemic anomalydetection on complex systems. For clarity of presentation, the remaining content of this submission is presented compactly in Fig 1.

Mixture Density Mercer Kernels: A Method to Learn Kernels

공공데이터포털

This paper presents a method of generating Mercer Kernels from an ensemble of probabilistic mixture models, where each mixture model is generated from a Bayesian mixture density estimate. We show how to convert the ensemble estimates into a Mercer Kernel, describe the properties of this new kernel function, and give examples of the performance of this kernel on unsupervised clustering of synthetic data and also in the domain of unsupervised multispectral image understanding.

목록