데이터셋 상세
미국
Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset
In this study, a major effort was undertaken to compile a large genotoxicity dataset (54,805 records for 9299 substances) from several public sources (e.g., TOXNET, COSMOS, eChemPortal). The names and outcomes of the different assays were harmonized, and assays were annotated by type: gene mutation in Salmonella bacteria (Ames assay) and chromosome mutation (clastogenicity) in vitro or in vivo (chromosome aberration, micronucleus, and mouse lymphoma Tk+/- assays). This dataset was then evaluated to assess genotoxic potential using a categorization scheme, whereby a substance was considered genotoxic if it was positive in at least one Ames or clastogen study. The categorization dataset comprised 8442 chemicals, of which 2728 chemicals were genotoxic, 5585 were not and 129 were inconclusive. QSAR models (TEST and VEGA) and the OECD Toolbox structural alerts/profilers (e.g., OASIS DNA alerts for Ames and chromosomal aberrations) were used to make in silico predictions of genotoxicity potential. The performance of the individual QSAR tools and structural alerts resulted in balanced accuracies of 57-73%. A Naïve Bayes consensus model was developed using combinations of QSAR models and structural alert predictions. The ‘best’ consensus model selected had a balanced accuracy of 81.2%, a sensitivity of 87.24% and a specificity of 75.20%. This in silico scheme offers promise as a first step in ranking thousands of substances as part of a prioritization approach for genotoxicity. This dataset is associated with the following publication: Pradeep, P., R. Judson, D. DeMarini, N. Keshava, T. Martin, J. Dean, C. Gibbons, A. Simha, S. Warren, M. Gwinn, and G. Patlewicz. An Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 18: 100167, (2021).
연관 데이터
Designing QSARs for parameters of high throughput toxicokinetic models using open-source descriptors
공공데이터포털
The MS Excel file (Dawson et al S2 Supporting information.xlsx) contains multiple sheets containing the training sets, test sets, and predictions for intrinsic metabolic clearance (Clint), fraction unbound in plasma (fup), and bioactivity-exposure ratios (BER), for ToxCast and pharmaceutical-like chemicals. The Word file (Dawson et al S1 Supporting Information.docx) provides additional supporting information on assembly of the training and test sets for Clint, fup, and BER. The data dictionary describes the terms used in the supporting information, S1 and S2. This dataset is associated with the following publication: Dawson, D., B. Ingle, K. Phillips, J. Nichols, J. Wambaugh, and R. Tornero-Velez. Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 55(9): 6505-6517, (2021).
Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors
공공데이터포털
Additional details used in the methods are found in the MS Word file “S1_Dawson et al._Supporting_Information.docx”. The MS Excel file “S2_Dawson et al. Supporting Information.xlsx” contains datasets and graphical results. The Excel file sheets are as follows: S2.1 illustrates Clint hepatic flow calculations, S2.2 - 5 include training and test data sets; S2.6-7 include figures illustrating Clint model selection criteria and assemblages of model descriptors; S2.8 includes confusion matrices for evaluation Clint model, S2.9-10 include figures illustrating fup model selection criteria and assemblages of model descriptors (with ranges); S2.11 includes tables of model assessments of the Clint test set, S2.12 includes information relevant to BER calculations for the ToxCast test set, S2.13 includes information relevant to BER calculations for Tox21 chemicals, and S2.14 provides information on different transformations for fup. This dataset is associated with the following publication: Dawson, D., B. Ingle, K. Phillips, J. Nichols, J. Wambaugh, and R. Tornero-Velez. Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 55(9): 6505, (6517).
Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors
공공데이터포털
Additional details used in the methods are found in the MS Word file “S1_Dawson et al._Supporting_Information.docx”. The MS Excel file “S2_Dawson et al. Supporting Information.xlsx” contains datasets and graphical results. The Excel file sheets are as follows: S2.1 illustrates Clint hepatic flow calculations, S2.2 - 5 include training and test data sets; S2.6-7 include figures illustrating Clint model selection criteria and assemblages of model descriptors; S2.8 includes confusion matrices for evaluation Clint model, S2.9-10 include figures illustrating fup model selection criteria and assemblages of model descriptors (with ranges); S2.11 includes tables of model assessments of the Clint test set, S2.12 includes information relevant to BER calculations for the ToxCast test set, S2.13 includes information relevant to BER calculations for Tox21 chemicals, and S2.14 provides information on different transformations for fup. This dataset is associated with the following publication: Dawson, D., B. Ingle, K. Phillips, J. Nichols, J. Wambaugh, and R. Tornero-Velez. Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 55(9): 6505, (6517).
Predict Organ Toxicity ChemResTox Data
공공데이터포털
We use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performnce was assessed based on F1 scores using five-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%) and these gains were correlated (ρ= 0.92) with the number of chemicals. This dataset is associated with the following publication: Liu, J., G. Patlewicz, A. Williams, R. Thomas, and I. Shah. (Chemical Research in Toxicology) Predicting organ toxicity using in vitro bioactivity data and chemical structure. CHEMICAL RESEARCH IN TOXICOLOGY. American Chemical Society, Washington, DC, USA, 30: 2046−2059, (2017).
Metadata Files for Structure-based QSAR models to predict repeat dose toxicity points of departure
공공데이터포털
This paper describes a model to take chemical structures and predict a property (the point of departure) for a new chemical. No new data were generated. The contents of this zip file contains metadata that you could use to make a model prediction. It does contain all of the code and a help file describing how to run the model. This dataset is associated with the following publication: Pradeep, P., K. Paul-Friedman, and R. Judson. Structure-based QSAR Models to Predict Repeat Dose Toxicity Points of Departure. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 16(November 2020): 100139, (2020).
A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics
공공데이터포털
The U.S. Tox21 Federal collaboration, which currently quantifies the biological effects of nearly 10,000 chemicals via quantitative high-throughput screening(qHTS) in in vitro model systems, is now making an effort to incorporate gene expression profiling into the existing battery of assays. Whole transcriptome analyses performed on large numbers of samples using microarrays or RNA-Seq is currently cost-prohibitive. Accordingly, the Tox21 Program is pursuing a high-throughput transcriptomics (HTT) method that focuses on the targeted detection of gene expression for a carefully selected subset of the transcriptome that potentially can reduce the cost by a factor of 10-fold, allowing for the analysis of larger numbers of samples. To identify the optimal transcriptome subset, genes were sought that are (1) representative of the highly diverse biological space, (2) capable of serving as a proxy for expression changes in unmeasured genes, and (3) sufficient to provide coverage of well described biological pathways. A hybrid method for gene selection is presented herein that combines data-driven and knowledge-driven concepts into one cohesive method. This dataset is associated with the following publication: Mav, D., R.R. Shah, B.E. Howard, S.S. Auerbach, P.R. Bushel, J.B. Collins, D.L. Gerhold, R. Judson, A.L. Karmaus, E.A. Maull, D.L. Mendrick, B.A. Merrick, N.S. Sipes, D. Svoboda, and R.S. Paules. A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 13(2): 1-17, (2018).
A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics
공공데이터포털
The U.S. Tox21 Federal collaboration, which currently quantifies the biological effects of nearly 10,000 chemicals via quantitative high-throughput screening(qHTS) in in vitro model systems, is now making an effort to incorporate gene expression profiling into the existing battery of assays. Whole transcriptome analyses performed on large numbers of samples using microarrays or RNA-Seq is currently cost-prohibitive. Accordingly, the Tox21 Program is pursuing a high-throughput transcriptomics (HTT) method that focuses on the targeted detection of gene expression for a carefully selected subset of the transcriptome that potentially can reduce the cost by a factor of 10-fold, allowing for the analysis of larger numbers of samples. To identify the optimal transcriptome subset, genes were sought that are (1) representative of the highly diverse biological space, (2) capable of serving as a proxy for expression changes in unmeasured genes, and (3) sufficient to provide coverage of well described biological pathways. A hybrid method for gene selection is presented herein that combines data-driven and knowledge-driven concepts into one cohesive method. This dataset is associated with the following publication: Mav, D., R.R. Shah, B.E. Howard, S.S. Auerbach, P.R. Bushel, J.B. Collins, D.L. Gerhold, R. Judson, A.L. Karmaus, E.A. Maull, D.L. Mendrick, B.A. Merrick, N.S. Sipes, D. Svoboda, and R.S. Paules. A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 13(2): 1-17, (2018).
QSARs for Plasma Protein Binding: Source Data and Predictions
공공데이터포털
The dataset has all of the information used to create and evaluate 3 independent QSAR models for the fraction of a chemical unbound by plasma protein (Fub) for environmentally relevant chemicals. In vitro plasma protein values for 1245 pharmaceuticals and 406 ToxCast chemicals were collected from the literature (Obach 2008, Zhu 2013, Wetmore 2012, Wetmore 2015). The 21 descriptors calculated by MOE that were used in the models are included, as is an acid/base/neutral/zwitterions classification based on ionization percentages calculated in ADMET Predictor. Finally, the dataset includes the in silico Fub predictions for each chemical from the constructed k-nearest neighbor, support vector machine, and random forest QSAR models, as well as a consensus (average) prediction. This dataset is associated with the following publication: Ingle, B., R. Tornero-Velez, J. Nichols, and B. Veber. Informing the Human Plasma Protein Binding of Environmental Chemicals by Machine Learning in the Pharmaceutical Space: Applicability Domain and Limits of Predictability. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, USA, 56(11): 2243-2252, (2016).
QSARs for Plasma Protein Binding: Source Data and Predictions
공공데이터포털
The dataset has all of the information used to create and evaluate 3 independent QSAR models for the fraction of a chemical unbound by plasma protein (Fub) for environmentally relevant chemicals. In vitro plasma protein values for 1245 pharmaceuticals and 406 ToxCast chemicals were collected from the literature (Obach 2008, Zhu 2013, Wetmore 2012, Wetmore 2015). The 21 descriptors calculated by MOE that were used in the models are included, as is an acid/base/neutral/zwitterions classification based on ionization percentages calculated in ADMET Predictor. Finally, the dataset includes the in silico Fub predictions for each chemical from the constructed k-nearest neighbor, support vector machine, and random forest QSAR models, as well as a consensus (average) prediction. This dataset is associated with the following publication: Ingle, B., R. Tornero-Velez, J. Nichols, and B. Veber. Informing the Human Plasma Protein Binding of Environmental Chemicals by Machine Learning in the Pharmaceutical Space: Applicability Domain and Limits of Predictability. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, USA, 56(11): 2243-2252, (2016).
Predictive Models for In Vitro Toxicokinetic Parameters to Inform High-throughput Risk-assessment Prachi
공공데이터포털
The data used in this analysis was obtained from published literature and available through the high-throughput toxicokinetic (HTTK) R package. The dataset consists of 1486 chemicals that span a variety of use classes including pharmaceuticals, food-use chemicals, pesticides and industrial chemicals of which 1139 chemicals had experimental human in vitro fraction unbound data and 642 chemicals that had experimental human in vitro intrinsic clearance data. Structures were curated and obtained from the DSSTox database. The distribution of experimental values for fraction unbound and intrinsic clearance is shown in Supplementary Figure S1. Since the data were non-normally distributed they were appropriately transformed before any analysis was conducted. The details of the transformation and the transformed data distribution are presented in the results section and Supplementary Figures S2 and S3. A complete list of chemicals with CAS registry numbers (CASRN), DSSTox generic substance IDs (DTXSIDs), structure and experimental data for both parameters are included as supplemental data (1.ChemicalListData.csv and 1.ChemicalList-QSARready.sdf). This dataset is associated with the following publication: Pradeep, P., G. Patlewicz, R. Pearce, J. Wambaugh, B. Wetmore, and R. Judson. Using Chemical Structure Information to Develop Predictive Models for In Vitro Toxicokinetic Parameters to Inform High-throughput Risk-assessment. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 16: 100136, (2020).