교육데이터 활용•지원 서비스

로그인

데이터셋 상세

미국

Permutation-validated principal components analysis of microarray data

Background In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure. Results We used PCA to detect the major sources of variance underlying the hybridization conditions followed by gene selection based on PCA-derived and permutation-based test statistics. We validated our method by applying it to well characterized yeast cell-cycle data and to two datasets from our laboratory. We could describe the major sources of variance, select informative genes and visualize the relationship of genes and arrays. We observed differences in the level of the explained variance and the interpretability of the selected genes. Conclusions Combining data visualization and permutation-based gene selection, permutation-validated PCA enables one to illustrate gene-expression variance between several conditions and to select genes by taking into account the relationship of between-group to within-group variance of genes. The method can be used to extract the leading sources of variance from microarray data, to visualize relationships between genes and hybridizations and to select informative genes in a statistically reliable manner. This selection accounts for the level of reproducibility of replicates or group structure as well as gene-specific scatter. Visualization of the data can support a straightforward biological interpretation.

데이터 정보

데이터 포털
미국
META URL
https://catalog.data.gov/dataset/permutation-validated-principal-components-analysis-of-microarray-data
라이선스
notspecified
비용
제공기관
U.S. Department of Health & Human Services
관리부서
데이터
- Official Government Data Source
- 랜딩 페이지

연관 데이터

How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach

공공데이터포털

Background It has been recognized that replicates of arrays (or spots) may be necessary for reliably detecting differentially expressed genes in microarray experiments. However, the often-asked question of how many replicates are required has barely been addressed in the literature. In general, the answer depends on several factors: a given magnitude of expression change, a desired statistical power (that is, probability) to detect it, a specified Type I error rate, and the statistical method being used to detect the change. Here, we discuss how to calculate the number of replicates in the context of applying a nonparametric statistical method, the normal mixture model approach, to detect changes in gene expression. Results The methodology is applied to a data set containing expression levels of 1,176 genes in rats with and without pneumococcal middle-ear infection. We illustrate how to calculate the power functions for 2, 4, 6 and 8 replicates. Conclusions The proposed method is potentially useful in designing microarray experiments to discover differentially expressed genes. The same idea can be applied to other statistical methods.

A simple method for statistical analysis of intensity differences in microarray-derived gene expression data

공공데이터포털

Background Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. Results A simple statistical method for estimating variances from microarray control data which does not require multiple replicates is presented. Comparison of datasets from two commercial entities using this difference-averaging method demonstrates that the standard deviation of the signal scales at a level intermediate between the signal intensity and its square root. Application of the method to a dataset related to the β-catenin pathway yields a larger number of biologically reasonable genes whose expression is altered than the ratio method. Conclusions The difference-averaging method enables determination of variances as a function of signal intensities by averaging over the entire dataset. The method also provides a platform-independent view of important statistical properties of microarray data.

Within the fold: assessing differential expression measures and reproducibility in microarray assays

공공데이터포털

Fold-change' cutoffs have been widely used in microarray assays to identify genes that are differentially expressed. More accurate measures are required to identify high-confidence sets of genes with biologically meaningful changes in transcription. A general procedure for analyzing cDNA microarray data is proposed and validated. It is shown that pooled reference samples should be based not only on the expression of individual genes in each cell line but also on the expression levels of genes within cell lines.

A Sensitivity Analysis of Methodological Variables Associated with Microbiome Measurements

공공데이터포털

This repository provides the raw data, analysis code, and results generated during a systematic evaluation of the impact of selected experimental protocol choices on the metagenomic sequencing analysis of microbiome samples. Briefly, a full factorial experimental design was implemented varying biological sample (n=5), operator (n=2), lot (n=2), extraction kit (n=2), 16S variable region (n=2), and reference database (n=3), and the main effects were calculated and compared between parameters (bias effects) and samples (real biological differences). A full description of the effort is provided in the associated publication.

MICRON Data (2015-2016) with associated R Markdown code

공공데이터포털

This data set includes water quality data and microbial community abundance tables for periphyton samples from this project. The data set also includes extensive R markdown code used to process the data and generate the results included in the report. This dataset is associated with the following publication: Hagy, J., R. Devereux, K. Houghton, D. Beddick, T. Pierce, and S. Friedman. Developing Microbial Community Indicators of Nutrient Exposure in Southeast Coastal Plain Streams using a Molecular Approach. US EPA Office of Research and Development, Washington, DC, USA, 2018.

MICRON Data (2015-2016) with associated R Markdown code

공공데이터포털

This data set includes water quality data and microbial community abundance tables for periphyton samples from this project. The data set also includes extensive R markdown code used to process the data and generate the results included in the report. This dataset is associated with the following publication: Hagy, J., R. Devereux, K. Houghton, D. Beddick, T. Pierce, and S. Friedman. Developing Microbial Community Indicators of Nutrient Exposure in Southeast Coastal Plain Streams using a Molecular Approach. US EPA Office of Research and Development, Washington, DC, USA, 2018.

Data for pilot-scale low level hydrogen peroxide tests using humidifiers

공공데이터포털

Dataset includes data from each experiment conducted in the pilot-scale testing. Each sheet of the Excel file pertains to each test. A data dictionary is included in the first sheet. In each sheet there are microbiological data (colony forming units) for each test and positive control coupon used in the study. Also shown is the calculation of decontamination efficacy (log10 reduction). This dataset is associated with the following publication: Wood, J., W. Calfee, S. Ryan, L. Mickelsen, M. Clayton, and V. Rastogi. A Simple Decontamination Approach Using Hydrogen Peroxide Vapor for Bacillus anthracis Spore Inactivation. JOURNAL OF APPLIED MICROBIOLOGY. Blackwell Publishing, Malden, MA, USA, 121(6): 1603-1615, (2016).

Reporting of measures of accuracy in systematic reviews of diagnostic literature

공공데이터포털

Background There are a variety of ways in which accuracy of clinical tests can be summarised in systematic reviews. Variation in reporting of summary measures has only been assessed in a small survey restricted to meta-analyses of screening studies found in a single database. Therefore, we performed this study to assess the measures of accuracy used for reporting results of primary studies as well as their meta-analysis in systematic reviews of test accuracy studies. Methods Relevant reviews on test accuracy were selected from the Database of Abstracts of Reviews of Effectiveness (1994–2000), which electronically searches seven bibliographic databases and manually searches key resources. The structured abstracts of these reviews were screened and information on accuracy measures was extracted from the full texts of 90 relevant reviews, 60 of which used meta-analysis. Results Sensitivity or specificity was used for reporting the results of primary studies in 65/90 (72%) reviews, predictive values in 26/90 (28%), and likelihood ratios in 20/90 (22%). For meta-analysis, pooled sensitivity or specificity was used in 35/60 (58%) reviews, pooled predictive values in 11/60 (18%), pooled likelihood ratios in 13/60 (22%), and pooled diagnostic odds ratio in 5/60 (8%). Summary ROC was used in 44/60 (73%) of the meta-analyses. There were no significant differences in measures of test accuracy among reviews published earlier (1994–97) and those published later (1998–2000). Conclusions There is considerable variation in ways of reporting and summarising results of test accuracy studies in systematic reviews. There is a need for consensus about the best ways of reporting results of test accuracy studies in reviews.

Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application

공공데이터포털

Background A model-based analysis of oligonucleotide expression arrays we developed previously uses a probe-sensitivity index to capture the response characteristic of a specific probe pair and calculates model-based expression indexes (MBEI). MBEI has standard error attached to it as a measure of accuracy. Here we investigate the stability of the probe-sensitivity index across different tissue types, the reproducibility of results in replicate experiments, and the use of MBEI in perfect match (PM)-only arrays. Results Probe-sensitivity indexes are stable across tissue types. The target gene's presence in many arrays of an array set allows the probe-sensitivity index to be estimated accurately. We extended the model to obtain expression values for PM-only arrays, and found that the 20-probe PM-only model is comparable to the 10-probe PM/MM difference model, in terms of the expression correlations with the original 20-probe PM/MM difference model. MBEI method is able to extend the reliable detection limit of expression to a lower mRNA concentration. The standard errors of MBEI can be used to construct confidence intervals of fold changes, and the lower confidence bound of fold change is a better ranking statistic for filtering genes. We can assign reliability indexes for genes in a specific cluster of interest in hierarchical clustering by resampling clustering trees. A software dChip implementing many of these analysis methods is made available. Conclusions The model-based approach reduces the variability of low expression estimates, and provides a natural method of calculating expression values for PM-only arrays. The standard errors attached to expression values can be used to assess the reliability of downstream analysis.

Quantitative assessment of the use of modified nucleoside triphosphates in expression profiling: differential effects on signal intensities and impacts on expression ratios

공공데이터포털

Background The power of DNA microarrays derives from their ability to monitor the expression levels of many genes in parallel. One of the limitations of such powerful analytical tools is the inability to detect certain transcripts in the target sample because of artifacts caused by background noise or poor hybridization kinetics. The use of base-modified analogs of nucleoside triphosphates has been shown to increase complementary duplex stability in other applications, and here we attempted to enhance microarray hybridization signal across a wide range of sequences and expression levels by incorporating these nucleotides into labeled cRNA targets. Results RNA samples containing 2-aminoadenosine showed increases in signal intensity for a majority of the sequences. These results were similar, and additive, to those seen with an increase in the hybridization time. In contrast, 5-methyluridine and 5-methylcytidine decreased signal intensities. Hybridization specificity, as assessed by mismatch controls, was dependent on both target sequence and extent of substitution with the modified nucleotide. Concurrent incorporation of modified and unmodified ATP in a 1:1 ratio resulted in significantly greater numbers of above-threshold ratio calls across tissues, while preserving ratio integrity and reproducibility. Conclusions Incorporation of 2-aminoadenosine triphosphate into cRNA targets is a promising method for increasing signal detection in microarrays. Furthermore, this approach can be optimized to minimize impact on yield of amplified material and to increase the number of expression changes that can be detected.

목록