데이터셋 상세
미국
New Approaches To Photometric Redshift Prediction
Expanding upon the work of Way & Srivastava (2006) we demonstrate how the use of training sets of comparable size continue to make Gaussian Process Regression a competitive approach to that of Neural Networks and other least squares fitting methods. This is possible via new large size matrix inversion techniques developed for Gaussian Processes that do not require that the kernel matrix be sparse. This development, combined with a neural-network kernel function appears to give superior results for this problem. Our best t results for the Sloan Digital Sky Survey Main Galaxy Sample using u,g,r,i,z gives an rms error of 0.0201 while our results for the same in the Luminous Red Galaxy Sample yield 0.0220.
데이터 정보
연관 데이터
Making Predictions using Large Scale Gaussian Processes
공공데이터포털
One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples [tex] ( X,y ) [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair [tex] ( X,y ) [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his [recent paper]( ) (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our [new paper](/dashlink/resources/51/) on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.
Novel Methods for Predicting Photometric Redshifts
공공데이터포털
We calculate photometric redshifts from the Sloan Digital Sky Survey Main Galaxy Sample, The Galaxy Evolution Explorer All Sky Survey, and The Two Micron All Sky Survey using two new training-set methods. We utilize the broad-band photometry from the three surveys alongside Sloan Digital Sky Survey measures of photometric quality and galaxy morphology. Our first training-set method draws from the theory of ensemble learning while the second employs Gaussian process regression both of which allow for the estimation of redshift along with a measure of uncertainty in the estimation. The Gaussian process models the data very effectively with small training samples of approximately 1000 points or less. These two methods are compared to a well known Artificial Neural Network training-set method and to simple linear and quadratic regression. We also demonstrate the need to provide confidence bands on the error estimation made by both classes of models. Our results indicate that variations due to the optimization procedure used for almost all neural networks, combined with the variations due to the data sample, can produce models with variations in accuracy that span an order of magnitude. A key contribution of this paper is to quantify the variability in the quality of results as a function of model and training sample. We show how simply choosing the ``best" model given a data set and model class can produce misleading results. We also investigate supplemental information provided by the Sloan Digital Sky Survey photometric pipeline related to photometric quality and galaxy morphology tracers. We show that, using these additional quality and morphology indicators rather than only the Sloan Digital Sky Survey broad-band u,g,r,i,z imaging data commonly used, one can improve redshift accuracy by 10s of percent. Near Infrared LaTeX broad-band photometry provided from the Two Micron All Sky Survey and near-ultraviolet and far-ultraviolet broad-band data from The Galaxy Evolution Explorer All Sky Survey are also investigated where they overlap with the Sloan Digital Sky Survey. Our results show that robust photometric redshift errors as low as 0.02 RMS can regularly be obtained. We believe these can be expanded to other photometric surveys where sufficient redshift calibration objects exist.
Stable and Efficient Gaussian Process Calculations
공공데이터포털
The use of Gaussian processes can be an effective approach to prediction in a supervised learning environment. For large data sets, the standard Gaussian process approach requires solving very large systems of linear equations and approximations are required for the calculations to be practical. We will focus on the subset of regressors approximation technique. We will demonstrate that there can be numerical instabilities in a well known implementation of the technique. We discuss alternate implementations that have better numerical stability properties and can lead to better predictions. Our results will be illustrated by looking at an application involving prediction of galaxy redshift from broadband spectrum data.
SPATIALLY ADAPTIVE SEMI-SUPERVISED LEARNING WITH GAUSSIAN PROCESSES FOR HYPERSPECTRAL DATA ANALYSIS
공공데이터포털
SPATIALLY ADAPTIVE SEMI-SUPERVISED LEARNING WITH GAUSSIAN PROCESSES FOR HYPERSPECTRAL DATA ANALYSIS GOO JUN * AND JOYDEEP GHOSH* Abstract. A semi-supervised learning algorithm for the classification of hyperspectral data, Gaussian process expectation maximization (GP-EM), is proposed. Model parameters for each land cover class is first estimated by a supervised algorithm using Gaussian process regressions to find spatially adaptive parameters, and the estimated parameters are then used to initialize a spatially adaptive mixture-of-Gaussians model. The mixture model is updated by expectationmaximization iterations using the unlabeled data, and the spatially adaptive parameters for unlabeled instances are obtained by Gaussian process regressions with soft assignments. Two sets of hyperspectral data taken from the Botswana area by the NASA EO-1 satellite are used for experiments. Empirical evaluations show that the proposed framework performs significantly better than baseline algorithms that do not use spatial information, and the results are also better than any previously reported results by other algorithms on the same data.
Detection and Prognostics on Low Dimensional Systems
공공데이터포털
This paper describes the application of known and novel prognostic algorithms on systems that can be described by low dimensional, potentially nonlinear dynamics. The methods rely on estimating the conditional probability distribution of the output of the system at a future time given knowledge of the current state of the system. We show how to estimate these conditional probabilities using a variety of techniques, including bagged neural networks and kernel methods such as Gaussian Process Regression (GPR). The results are compared with standard method such as the nearest neighbor algorithm. We demonstrate the algorithms on a real-world data set and a simulated data set. The real-world data set consists of the intensity of an NH3 laser. The laser data set has been shown by other authors to exhibit low-dimensional chaos with sudden drops in intensity. The simulated data set is generated from the Lorenz attractor and has known statistical characteristics. On these data sets, we show the evolution of the estimated conditional probability distribution, the way it can act as a prognostic signal, and its use as an early warning system. We also review a novel approach to perform Gaussian Process Regression with large numbers of data points.
Distributed Monitoring of the R2 Statistic for Linear Regression
공공데이터포털
The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes' data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo --- a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Modeling non-Gaussian time-varying vector autoregressive process
공공데이터포털
We present a novel and general methodology for modeling time-varying vector autoregressive processes which are widely used in many areas such as modeling of chemical processes, mobile communication channels and biomedical signals. In the literature, most work utilize multivariate Gaussian models for the mentioned applications, mainly due to the lack of efficient analytical tools for modeling with non-Gaussian distributions. In this paper, we propose a particle filtering approach which can model non-Gaussian autoregressive processes having cross-correlations among them. Moreover, time-varying parameters of the process can be modeled as the most general case by using this sequential Bayesian estimation method. Simulation results justify the performance of the proposed technique, which potentially can model also Gaussian processes as a sub-case.
한국전자통신연구원 자차 타차 주행궤적 학습 데이터
공공데이터포털
자율주행 차량 주변 동적 객체의 미래 움직임 예측을 위한 인공지능 학습 데이터 셋입니다.아래 링크에서 세부 정보를 확인하실 수 있으며 전체 데이터를 다운로드 받을 수 있습니다.https://nanum.etri.re.kr/share/kimjy/Trajectory?lang=ko_KR