교육데이터 활용•지원 서비스

로그인

데이터셋 상세

미국

Cluster-Rasch models for microarray gene expression data

Background We propose two different formulations of the Rasch statistical models to the problem of relating gene expression profiles to the phenotypes. One formulation allows us to investigate whether a cluster of genes with similar expression profiles is related to the observed phenotypes; this model can also be used for future prediction. The other formulation provides an alternative way of identifying genes that are over- or underexpressed from their expression levels in tissue or cell samples of a given tissue or cell type. Results We illustrate the methods on available datasets of a classification of acute leukemias and of 60 cancer cell lines. For tumor classification, the results are comparable to those previously obtained. For the cancer cell lines dataset, we found four clusters of genes that are related to drug response for many of the 90 drugs that we considered. In addition, for each type of cell line, we identified genes that are over- or underexpressed relative to other genes. Conclusions The cluster-Rasch model provides a probabilistic model for describing gene expression patterns across samples and can be used to relate gene expression profiles to phenotypes.

데이터 정보

데이터 포털
미국
META URL
https://catalog.data.gov/dataset/cluster-rasch-models-for-microarray-gene-expression-data
라이선스
notspecified
비용
제공기관
U.S. Department of Health & Human Services
관리부서
데이터
- Official Government Data Source
- 랜딩 페이지

연관 데이터

Improved analytical methods for microarray-based genome-composition analysis

공공데이터포털

Genome-composition analysis using microarrays can be used to categorize genes into 'present' and 'divergent' categories. This involves selecting a signal value that is used as a cutoff to discriminate present and divergent genes, but this can result in the misclassification of many genes. A method is described that depends on the shape of the signal-ratio distribution and does not require empirical determination of a cutoff. Many genes previously classified as present using static methods are in fact divergent on the basis of microarray signal; this is corrected by our algorithm.

Murine microenvironment metaprofiles associate with human cancer etiology and intrinsic subtypes

공공데이터포털

We developed a mouse model that captures radiation effects on host biology by transplanting unirradiated Trp53 null mammary tissue to sham or irradiated hosts. Gene expression profiles of tumors that arose in irradiated mice are distinct from those that arose in naive hosts. Host irradiation induces a metaprofile consisting of gene modules representing stem cells, cell motility, macrophages and autophagy. Human orthologs of the host irradiation metaprofile discriminated between radiation-preceded and sporadic human thyroid cancers. An irradiated host centroid was strongly associated with estrogen receptor negative breast cancer. When applied to sporadic human breast cancers, the irradiated host metaprofile strongly associated with basal-like and claudin-low breast cancer intrinsic subtypes. Comparing host irradiation in the context of TGFB levels showed that inflammation was robustly associated with claudin-low tumors. The association of the irradiated host metaprofiles with estrogen receptor negative status and claudin-low subtype suggests that host processes similar to those induced by radiation underlie sporadic cancers. Total RNA was extracted from mammary tumors derived from transplantations of non-irradiated p53null mammary fragments into irradiated hosts. We analyized a total of 32 p53null tumors from irradiated wild type mice: 9 from sham-irradiated hosts, and 23 from irradiated hosts. We also analyzed 24 tumors from irradiated TGFb1 heterozygote hosts: 6 from sham-irradiated hosts, and 18 from irradiated hosts.

A simple method for statistical analysis of intensity differences in microarray-derived gene expression data

공공데이터포털

Background Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. Results A simple statistical method for estimating variances from microarray control data which does not require multiple replicates is presented. Comparison of datasets from two commercial entities using this difference-averaging method demonstrates that the standard deviation of the signal scales at a level intermediate between the signal intensity and its square root. Application of the method to a dataset related to the β-catenin pathway yields a larger number of biologically reasonable genes whose expression is altered than the ratio method. Conclusions The difference-averaging method enables determination of variances as a function of signal intensities by averaging over the entire dataset. The method also provides a platform-independent view of important statistical properties of microarray data.

Provides an overview of the analysis and associated files, scripts and datasets

공공데이터포털

This dataset contains the files, scripts and data that were used to run the simulations and data analyses for the manuscript. This dataset is associated with the following publication: Ball, K., C. Grant, W. Mundy, and T. Shafer. A multivariate extension of mutual information for growing neural networks.. Neural Networks. Elsevier B.V., Amsterdam, NETHERLANDS, 95: 29-43, (2017).

Provides an overview of the analysis and associated files, scripts and datasets

공공데이터포털

This dataset contains the files, scripts and data that were used to run the simulations and data analyses for the manuscript. This dataset is associated with the following publication: Ball, K., C. Grant, W. Mundy, and T. Shafer. A multivariate extension of mutual information for growing neural networks.. Neural Networks. Elsevier B.V., Amsterdam, NETHERLANDS, 95: 29-43, (2017).

Gene-expression profiling of the response of peripheral blood mononuclear cells and melanoma metastases to systemic IL-2 administration

공공데이터포털

Early changes in transcriptional profiles of circulating mononuclear cells were compared with those occurring within the microenvironment of melanoma metastases following systemic IL-2 administration. The results suggest that the immediate effects of IL-2 administration on the tumor microenvironment is transcriptional activation of genes predominantly associated with monocyte cell function.

Markers for early detection of cancer: Statistical guidelines for nested case-control studies

공공데이터포털

Background Recently many long-term prospective studies have involved serial collection and storage of blood or tissue specimens. This has spurred nested case-control studies that involve testing some specimens for various markers that might predict cancer. Until now there has been little guidance in statistical design and analysis of these studies. Methods To develop statistical guidelines, we considered the purpose, the types of biases, and the opportunities for extracting additional information. Results The following guidelines: (1) For the clearest interpretation, statistics should be based on false and true positive rates – not odds ratios or relative risks (2) To avoid overdiagnosis bias, cases should be diagnosed as a result of symptoms rather than on screening. (3) To minimize selection bias, the spectrum of control conditions should be the same in study and target screening populations. (4) To extract additional information, criteria for a positive test should be based on combinations of individual markers and changes in marker levels over time. (5) To avoid overfitting, the criteria for a positive marker combination developed in a training sample should be evaluated in a random test sample from the same study and, if possible, a validation sample from another study. (6) To identify biomarkers with true and false positive rates similar to mammography, the training, test, and validation samples should each include at least 110 randomly selected subjects without cancer and 70 subjects with cancer. Conclusion These guidelines ensure good practice in the design and analysis of nested case-control studies of early detection biomarkers.

목록