교육데이터 활용•지원 서비스

로그인

데이터셋 상세

미국

A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics

The U.S. Tox21 Federal collaboration, which currently quantifies the biological effects of nearly 10,000 chemicals via quantitative high-throughput screening(qHTS) in in vitro model systems, is now making an effort to incorporate gene expression profiling into the existing battery of assays. Whole transcriptome analyses performed on large numbers of samples using microarrays or RNA-Seq is currently cost-prohibitive. Accordingly, the Tox21 Program is pursuing a high-throughput transcriptomics (HTT) method that focuses on the targeted detection of gene expression for a carefully selected subset of the transcriptome that potentially can reduce the cost by a factor of 10-fold, allowing for the analysis of larger numbers of samples. To identify the optimal transcriptome subset, genes were sought that are (1) representative of the highly diverse biological space, (2) capable of serving as a proxy for expression changes in unmeasured genes, and (3) sufficient to provide coverage of well described biological pathways. A hybrid method for gene selection is presented herein that combines data-driven and knowledge-driven concepts into one cohesive method. This dataset is associated with the following publication: Mav, D., R.R. Shah, B.E. Howard, S.S. Auerbach, P.R. Bushel, J.B. Collins, D.L. Gerhold, R. Judson, A.L. Karmaus, E.A. Maull, D.L. Mendrick, B.A. Merrick, N.S. Sipes, D. Svoboda, and R.S. Paules. A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 13(2): 1-17, (2018).

데이터 정보

데이터 포털
미국
META URL
https://catalog.data.gov/dataset/a-hybrid-gene-selection-approach-to-create-the-s1500-targeted-gene-sets-for-use-in-high-th-bec6e
라이선스
other-license-specified
비용
제공기관
U.S. Environmental Protection Agency
관리부서
데이터
- https://doi.org/10.1371/journal.pone.0191105
- 랜딩 페이지

연관 데이터

(Archives of Toxicology) Recommended approaches in the application of toxicogenomics to derive points of departure for chemical risk assessment

공공데이터포털

To determine the best way to select predictive groups of genes, we used published microarray data from dose-response studies on six chemicals in rats exposed orally for 5, 14, 28, and 90 days. We evaluated eight approaches for selecting genes for POD derivation and three previously proposed approaches (the lowest pathway BMD, and the mean and median BMD of all genes). This dataset is not publicly accessible because: The research which produced this data was not funded by EPA. The EPA coauthor helped write the manuscript. It can be accessed through the following means: Data generated by other authors. Format: N/A. This dataset is associated with the following publication: Farmahin, R., A. Williams, B. Kuo, N.L. Chepelev, R.S. Thomas, T.S. Burton-Maclaren, I.H. Curran, A. Nong, M.G. Wade, and C.L. Yauk. (Archives of Toxicology) Recommended approaches in the application of toxicogenomics to derive points of departure for chemical risk assessment. Archives of Toxicology. Springer, New York, NY, USA, 91(5): 2045-2065, (2017).

(Archives of Toxicology) Recommended approaches in the application of toxicogenomics to derive points of departure for chemical risk assessment

공공데이터포털

To determine the best way to select predictive groups of genes, we used published microarray data from dose-response studies on six chemicals in rats exposed orally for 5, 14, 28, and 90 days. We evaluated eight approaches for selecting genes for POD derivation and three previously proposed approaches (the lowest pathway BMD, and the mean and median BMD of all genes). This dataset is not publicly accessible because: The research which produced this data was not funded by EPA. The EPA coauthor helped write the manuscript. It can be accessed through the following means: Data generated by other authors. Format: N/A. This dataset is associated with the following publication: Farmahin, R., A. Williams, B. Kuo, N.L. Chepelev, R.S. Thomas, T.S. Burton-Maclaren, I.H. Curran, A. Nong, M.G. Wade, and C.L. Yauk. (Archives of Toxicology) Recommended approaches in the application of toxicogenomics to derive points of departure for chemical risk assessment. Archives of Toxicology. Springer, New York, NY, USA, 91(5): 2045-2065, (2017).

Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset

공공데이터포털

In this study, a major effort was undertaken to compile a large genotoxicity dataset (54,805 records for 9299 substances) from several public sources (e.g., TOXNET, COSMOS, eChemPortal). The names and outcomes of the different assays were harmonized, and assays were annotated by type: gene mutation in Salmonella bacteria (Ames assay) and chromosome mutation (clastogenicity) in vitro or in vivo (chromosome aberration, micronucleus, and mouse lymphoma Tk+/- assays). This dataset was then evaluated to assess genotoxic potential using a categorization scheme, whereby a substance was considered genotoxic if it was positive in at least one Ames or clastogen study. The categorization dataset comprised 8442 chemicals, of which 2728 chemicals were genotoxic, 5585 were not and 129 were inconclusive. QSAR models (TEST and VEGA) and the OECD Toolbox structural alerts/profilers (e.g., OASIS DNA alerts for Ames and chromosomal aberrations) were used to make in silico predictions of genotoxicity potential. The performance of the individual QSAR tools and structural alerts resulted in balanced accuracies of 57-73%. A Naïve Bayes consensus model was developed using combinations of QSAR models and structural alert predictions. The ‘best’ consensus model selected had a balanced accuracy of 81.2%, a sensitivity of 87.24% and a specificity of 75.20%. This in silico scheme offers promise as a first step in ranking thousands of substances as part of a prioritization approach for genotoxicity. This dataset is associated with the following publication: Pradeep, P., R. Judson, D. DeMarini, N. Keshava, T. Martin, J. Dean, C. Gibbons, A. Simha, S. Warren, M. Gwinn, and G. Patlewicz. An Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 18: 100167, (2021).

GENE-TOX (Genetic Toxicology Data Bank)

공공데이터포털

GENE-TOX provided genetic toxicology (mutagenicity) test data from expert peer review of open scientific literature for more than 3,000 chemicals from the United States Environmental Protection Agency (EPA) covering the years 1991 - 1998. GENE-TOX is no longer updated.

Integrating Transcriptomic and Targeted New Approach Methodologies into a Tiered Framework for Chemical Bioactivity Screening

공공데이터포털

Dataset for Jesse Rogers et al., 'Integrating Transcriptomic and Targeted New Approach Methodologies into a Tiered Framework for Chemical Bioactivity Screening' published in Environmental Health Perspectives, Vol 133, Issue 6, 067013, June 2025. DOI: https://doi.org/10.1289/EHP16024, PMC12165737 R scripts for reproducing all analyses are available on Github (https://github.com/USEPA/CompTox-HTTr-RCAS). All sequencing data are available via the Gene Expression Omnibus repository (accessionnumbers GSE274318 for U-2 OS and GSE284321 for HepaRG). High-throughput screening assay data are available from InvitroDB via download 29 or the USEPA CompTox Chemicals Dashboard(https://comptox.epa.gov/dashboard/). This dataset is associated with the following publication: Rogers, J., J. Bundy, J. Harrill, R. Judson, K. Friedman, and L. Everett. Integrating Transcriptomic and Targeted New Approach Methodologies into a Tiered Framework for Chemical Bioactivity Screening. ENVIRONMENTAL HEALTH PERSPECTIVES. National Institute of Environmental Health Sciences (NIEHS), Research Triangle Park, NC, USA, 133(6): 067013, (2025).

Chemical-Gene and Chemical-Pathway Interactions Predicted for Chemicals Detected in the USGS-USEPA National Streams Pilot Study Based on Effects Data in the Comparative Toxicogenomics Database (CTD)

공공데이터포털

Data from study assessing the utility of knowledgebase-leveraging of comprehensive environmental-contaminant-exposure datasets by comparing biological effects predicted on the basis of target chemical analyses with measured biological effects in corresponding split water samples.

High-throughput screening tools facilitate calculation of a combined exposure-bioactivity index for chemicals with endocrine activity

공공데이터포털

Dataset consists of high throughput in vitro bioactivity data and exposure predictions from the U.S. EPA’s Toxicity and Exposure Forecaster (ToxCast and ExpoCast) project. This dataset is associated with the following publication: Wegner, S., C. Pinto, C. Ring, and J. Wambaugh. High-throughput screening tools facilitate calculation of a combined exposure-bioactivity index for chemicals with endocrine activity. ENVIRONMENT INTERNATIONAL. Elsevier B.V., Amsterdam, NETHERLANDS, 137: 105470, (2020).

High-throughput screening tools facilitate calculation of a combined exposure-bioactivity index for chemicals with endocrine activity

공공데이터포털

Dataset consists of high throughput in vitro bioactivity data and exposure predictions from the U.S. EPA’s Toxicity and Exposure Forecaster (ToxCast and ExpoCast) project. This dataset is associated with the following publication: Wegner, S., C. Pinto, C. Ring, and J. Wambaugh. High-throughput screening tools facilitate calculation of a combined exposure-bioactivity index for chemicals with endocrine activity. ENVIRONMENT INTERNATIONAL. Elsevier B.V., Amsterdam, NETHERLANDS, 137: 105470, (2020).

The sensitivity of transcriptomics BMD modeling to the methods used for microarray data normalization

공공데이터포털

This dataset is a project file generated by BMDExpress 2.2 SW (Sciome, Research Triangle Park, NC). It contains gene expression data for livers of rats exposed to 4 chemicals (crude MCHM, neat MCHM, DMPT, p-toluidine) and kidneys of rats exposed to PPH. The project file includes normalized expression data (GeneChip Rat 230 2.0 Array) using 7 different pre-processing methods (RMA, GCRMA, MAS5.0, MAS5.0_noA calls, PLIER, PLIER16, and PLIER16_noA calls); differentially expressed probe-sets detected by William's method (p<0.05, and minimum fold change of 1.5); probeset-level and pathway-level BMD and BMDL values from transcriptomic dose-response modeling. This dataset is associated with the following publication: Mezencev, R., and S. Auerbach. The sensitivity of transcriptomics BMD modeling to the methods used for microarray data normalization. PLOS ONE. Public Library of Science, San Francisco, CA, USA, 15(5): e0232955, (2020).

Predict Organ Toxicity ChemResTox Data

공공데이터포털

We use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performnce was assessed based on F1 scores using five-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%) and these gains were correlated (ρ= 0.92) with the number of chemicals. This dataset is associated with the following publication: Liu, J., G. Patlewicz, A. Williams, R. Thomas, and I. Shah. (Chemical Research in Toxicology) Predicting organ toxicity using in vitro bioactivity data and chemical structure. CHEMICAL RESEARCH IN TOXICOLOGY. American Chemical Society, Washington, DC, USA, 30: 2046−2059, (2017).

목록