데이터셋 상세
미국
Ensemble Approach to Building Mercer Kernels
This paper presents a new methodology for automatic knowledge driven data mining based on the theory of Mercer Kernels, which are highly nonlinear symmetric positive definite mappings from the original image space to a very high, possibly infinite dimensional feature space. We describe a new method called Mixture Density Mercer Kernels (MDMK) to learn kernel function directly from data, rather than using pre-defined kernels. These data adaptive kernels can encode prior knowledge in the kernel using a Bayesian formulation, thus allowing for physical information to be encoded in the model. Specifically, we demonstrate the use of the algorithm in situations with extremely small samples of data. We compare the results with existing algorithms on data from the Sloan Digital Sky Survey (SDSS) and demonstrate the method’s superior performance against standard methods. The results show that the Mixture Density Mercer Kernel described here outperforms tree-based classification in distinguishing high-redshift galaxies from low redshift galaxies by approximately 16% on test data, bagged trees by approximately 7%, and bagged trees built on a much larger sample of data by approximately 2%. The code for these experiments has been generated with the AutoBayes tool, which automatically generates efficient and documented C/C++ code from abstract statistical model specifications. The core of the system is a schema library which contains templates for learning and knowledge discovery algorithms like different versions of EM, or numeric optimization methods like conjugate gradient methods. The template instantiation is supported by symbolic algebraic computations, which allows AutoBayes to find closed form solutions and, where possible, to integrate them into the code.
데이터 정보
연관 데이터
Mixture Density Mercer Kernels: A Method to Learn Kernels
공공데이터포털
This paper presents a method of generating Mercer Kernels from an ensemble of probabilistic mixture models, where each mixture model is generated from a Bayesian mixture density estimate. We show how to convert the ensemble estimates into a Mercer Kernel, describe the properties of this new kernel function, and give examples of the performance of this kernel on unsupervised clustering of synthetic data and also in the domain of unsupervised multispectral image understanding.
MKAD (Open Sourced Code)
공공데이터포털
The Multiple Kernel Anomaly Detection (MKAD) algorithm is designed for anomaly detection over a set of files. It combines multiple kernels into a single optimization function using the One Class Support Vector Machine (OCSVM) framework. Any kernel function can be combined in the algorithm as long as it meets the Mercer conditions, however for the purposes of this code the data preformatting and kernel type is specific to the Flight Operations Quality Assurance (FOQA) data and has been integrated into the coding steps. For this domain, discrete binary switch sequences are used in the discrete kernel, and discretized continuous parameter features are used to form the continuous kernel. The OCSVM uses a training set of nominal examples (in this case flights) and evaluates test examples for anomaly detection to determine whether they are anomalous or not. After completing this analysis the algorithm reports the anomalous examples and determines whether there is a contribution from either or both continuous and discrete elements.
ARC Code TI: Multiple Kernel Anomaly Detection (MKAD) Algorithm
공공데이터포털
The Multiple Kernel Anomaly Detection (MKAD) algorithm is designed for anomaly detection over a set of files.
New Approaches To Photometric Redshift Prediction
공공데이터포털
Expanding upon the work of Way & Srivastava (2006) we demonstrate how the use of training sets of comparable size continue to make Gaussian Process Regression a competitive approach to that of Neural Networks and other least squares fitting methods. This is possible via new large size matrix inversion techniques developed for Gaussian Processes that do not require that the kernel matrix be sparse. This development, combined with a neural-network kernel function appears to give superior results for this problem. Our best t results for the Sloan Digital Sky Survey Main Galaxy Sample using u,g,r,i,z gives an rms error of 0.0201 while our results for the same in the Luminous Red Galaxy Sample yield 0.0220.
스마트쿱㈜ - 개체 레벨 인식(Instance level recognition, ILR) 데이터
공공데이터포털
ㅇ 체계적인 계층구조 설계와 세밀한 분류를 통해 개체 검출, 인식 향상에 활용할 수 있는 객체 데이터 - 대분류 내 다양한 세부분류를 인공지능이 학습하기 위해 세부분류별로 다양하고 균형 있게 구성된 데이터셋 구축 필요 - 체계적인 계층구조를 갖고 있는 대분류 300종 이상에 대해 10,000종 이상의 개체를 대상으로 10만장 이상의 데이터를 구축 • 객체별 계층구조 검출을 위해 단일객체 및 다중객체에 대한 이미지 데이터를 1:1 비율로 구축
Multiple Kernel Learning for Heterogeneous Anomaly Detection: Algorithm and Aviation Safety Case Study
공공데이터포털
The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. In this paper, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequences of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also discuss results on real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.
식품의약품안전처 인공지능 개발을 위한 알약 이미지 데이터
공공데이터포털
의약품 식별표시 인식을 통해 당해 의약품 정보를 보다 쉽고 간편하게 접근할 수 있도록 "의약품 안전 사용을 위한 전자적 정보제공 체계구축" 용역연구 완료에 따라 정보 공개 * 상세 내용은 약학정보원 홈페이지 참고 https://www.health.kr/notice/notice_view.asp?show_idx=1001&search_value=&search_term=&paging_value=&setLine=&setCategory
Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network
공공데이터포털
Inner product computation is an important primitive used in many techniques for feature dependency detection, distance computation, clustering and correlation computation among others. Recently, peer-to-peer networks are getting increasing attention in various applications involving distributed file sharing, sensor networks, and mobile ad hoc networks. Efficient identification of top few inner product entries from the entire inner product matrix of features in a distributed peer-to-peer network is a challenging problem since centralizing the data from all the nodes in a synchronous, communication efficient manner may not be an option. This paper deals with the problem of identifying significant inner products among features in a peer-to-peer environment where different nodes observe a different set of data. It uses an ordinal framework to develop probabilistic algorithms to find top-LaTeX elements in the inner product matrix. These l inner product entries are important in making crucial decisions about dependency or relatedness between feature pairs, important for a number of data mining applications. In this paper we present experimental results demonstrating accurate and scalable performance of this algorithm for large peer-to-peer networks and also describe a real-world application for this algorithm.
한국어 생성 기반 상식추론 데이터셋
공공데이터포털
사전에 구축된 AI-HUB의 대화 요약 및 이미지 캡션 텍스트 데이터로부터 Tagger 및 신경망 네트워크를 통해 반자동화 구축 방식을 적용하여 형태소를 추출. 추출한 형태소는 하나의 개념 집합을구성하며, 개념 집합의 내용을 바탕으로 일반 상식에 부합하는 짧은 문장을 재구성하도록 하는 자연어 생성 데이터.