데이터셋 상세
미국
Stable and Efficient Gaussian Process Calculations
The use of Gaussian processes can be an effective approach to prediction in a supervised learning environment. For large data sets, the standard Gaussian process approach requires solving very large systems of linear equations and approximations are required for the calculations to be practical. We will focus on the subset of regressors approximation technique. We will demonstrate that there can be numerical instabilities in a well known implementation of the technique. We discuss alternate implementations that have better numerical stability properties and can lead to better predictions. Our results will be illustrated by looking at an application involving prediction of galaxy redshift from broadband spectrum data.
연관 데이터
Making Predictions using Large Scale Gaussian Processes
공공데이터포털
One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples [tex] ( X,y ) [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair [tex] ( X,y ) [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his [recent paper]( ) (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our [new paper](/dashlink/resources/51/) on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.
Modeling non-Gaussian time-varying vector autoregressive process
공공데이터포털
We present a novel and general methodology for modeling time-varying vector autoregressive processes which are widely used in many areas such as modeling of chemical processes, mobile communication channels and biomedical signals. In the literature, most work utilize multivariate Gaussian models for the mentioned applications, mainly due to the lack of efficient analytical tools for modeling with non-Gaussian distributions. In this paper, we propose a particle filtering approach which can model non-Gaussian autoregressive processes having cross-correlations among them. Moreover, time-varying parameters of the process can be modeled as the most general case by using this sequential Bayesian estimation method. Simulation results justify the performance of the proposed technique, which potentially can model also Gaussian processes as a sub-case.
ARC Code TI: Block-GP: Scalable Gaussian Process Regression
공공데이터포털
Block GP is a Gaussian Process regression framework for multimodal data, that can be an order of magnitude more scalable than existing state-of-the-art nonlinear regression algorithms. The framework builds local Gaussian Processes on semantically meaningful partitions of the data and provides higher prediction accuracy than a single global model with very high confidence.
SPATIALLY ADAPTIVE SEMI-SUPERVISED LEARNING WITH GAUSSIAN PROCESSES FOR HYPERSPECTRAL DATA ANALYSIS
공공데이터포털
SPATIALLY ADAPTIVE SEMI-SUPERVISED LEARNING WITH GAUSSIAN PROCESSES FOR HYPERSPECTRAL DATA ANALYSIS GOO JUN * AND JOYDEEP GHOSH* Abstract. A semi-supervised learning algorithm for the classification of hyperspectral data, Gaussian process expectation maximization (GP-EM), is proposed. Model parameters for each land cover class is first estimated by a supervised algorithm using Gaussian process regressions to find spatially adaptive parameters, and the estimated parameters are then used to initialize a spatially adaptive mixture-of-Gaussians model. The mixture model is updated by expectationmaximization iterations using the unlabeled data, and the spatially adaptive parameters for unlabeled instances are obtained by Gaussian process regressions with soft assignments. Two sets of hyperspectral data taken from the Botswana area by the NASA EO-1 satellite are used for experiments. Empirical evaluations show that the proposed framework performs significantly better than baseline algorithms that do not use spatial information, and the results are also better than any previously reported results by other algorithms on the same data.
Modeling of non-stationary autoregressive alpha-stable processe
공공데이터포털
In the literature, impulsive signals are mostly modeled by symmetric alpha-stable processes. To represent their temporal dependencies, usually autoregressive models with time-invariant coefficients are utilized. We propose a general sequential Bayesian modeling methodology where both unknown autoregressive coefficients and distribution parameters can be estimated successfully, even when they are time-varying. In contrast to most work in the literature on signal processing with alpha-stable distributions, our work is general and models also skewed alpha-stable processes. Successful performance of our method is demonstrated by computer simulations. We support our empirical results by providing posterior Cramer–Rao lower bounds. The proposed method is also tested on a practical application where seismic data events are modeled.
Block-GP: Scalable Gaussian Process Regression for Multimodal Data
공공데이터포털
Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. In many cases, regression algorithms such as linear regression or neural networks attempt to fit the target variable as a function of the input variables without regard to the underlying joint distribution of the variables. As a result, these global models are not sensitive to variations in the local structure of the input space. Several algorithms, including the mixture of experts model, classification and regression trees (CART), and others have been developed, motivated by the fact that a variability in the local distribution of inputs may be reflective of a significant change in the target variable. While these methods can handle the non-stationarity in the relationships to varying degrees, they are often not scalable and, therefore, not used in large scale data mining applications. In this paper we develop Block-GP, a Gaussian Process regression framework for multimodal data, that can be an order of magnitude more scalable than existing state-of-the-art nonlinear regression algorithms. The framework builds local Gaussian Processes on semantically meaningful partitions of the data and provides higher prediction accuracy than a single global model with very high confidence. The method relies on approximating the covariance matrix of the entire input space by smaller covariance matrices that can be modeled independently, and can therefore be parallelized for faster execution. Theoretical analysis and empirical studies on various synthetic and real data sets show high accuracy and scalability of Block-GP compared to existing nonlinear regression techniques.
New Approaches To Photometric Redshift Prediction
공공데이터포털
Expanding upon the work of Way & Srivastava (2006) we demonstrate how the use of training sets of comparable size continue to make Gaussian Process Regression a competitive approach to that of Neural Networks and other least squares fitting methods. This is possible via new large size matrix inversion techniques developed for Gaussian Processes that do not require that the kernel matrix be sparse. This development, combined with a neural-network kernel function appears to give superior results for this problem. Our best t results for the Sloan Digital Sky Survey Main Galaxy Sample using u,g,r,i,z gives an rms error of 0.0201 while our results for the same in the Luminous Red Galaxy Sample yield 0.0220.
Determining the Predictive Limit of QSAR Models
공공데이터포털
The research done to evaluate how the predictivity of models are effected by error in either the training or the test set is simple to describe conceptually. Benchmark datasets are downloaded from reputable sources. Then the datasets are split into training and test sets. Randomized error is added and then models created on both error laden and native training sets. Those models are used to predict both error laden and native test sets. Differences in standard statistics commonly used to assess predictivity are observed. This dataset is associated with the following publication: Kolmar, S., and C. Grulke. The Effect of Noise on the Predictive Limit of QSAR Models. Journal of Cheminformatics. Springer, New York, NY, USA, 13: 92, (2021).
Estimation of Time-Varying Autoregressive Symmetric Alpha Stable
공공데이터포털
In the last decade alpha-stable distributions have become a standard model for impulsive data. Especially the linear symmetric alpha-stable processes have found applications in various fields. When the process parameters are time- invariant, various techniques are available for estimation. However, time-invariance is an important restriction given that in many communications applications channels are time-varying. For such processes, we propose a relatively new technique, based on particle filters which obtained great success in tracking applications involving non-Gaussian signals and nonlinear systems. Since particle filtering is a sequential method, it enables us to track the time-varying autoregression coefficients of the alpha-stable processes. The method is tested both for abruptly and slowly changing autoregressive parameters of signals, where the driving noises are symmetric-alpha-stable processes and is observed to perform very well. Moreover, the method can easily be extended to skewed alpha-stable distributions.
Detection and Prognostics on Low Dimensional Systems
공공데이터포털
This paper describes the application of known and novel prognostic algorithms on systems that can be described by low dimensional, potentially nonlinear dynamics. The methods rely on estimating the conditional probability distribution of the output of the system at a future time given knowledge of the current state of the system. We show how to estimate these conditional probabilities using a variety of techniques, including bagged neural networks and kernel methods such as Gaussian Process Regression (GPR). The results are compared with standard method such as the nearest neighbor algorithm. We demonstrate the algorithms on a real-world data set and a simulated data set. The real-world data set consists of the intensity of an NH3 laser. The laser data set has been shown by other authors to exhibit low-dimensional chaos with sudden drops in intensity. The simulated data set is generated from the Lorenz attractor and has known statistical characteristics. On these data sets, we show the evolution of the estimated conditional probability distribution, the way it can act as a prognostic signal, and its use as an early warning system. We also review a novel approach to perform Gaussian Process Regression with large numbers of data points.