데이터셋 상세
미국
Predict Organ Toxicity ChemResTox Data
We use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performnce was assessed based on F1 scores using five-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%) and these gains were correlated (ρ= 0.92) with the number of chemicals. This dataset is associated with the following publication: Liu, J., G. Patlewicz, A. Williams, R. Thomas, and I. Shah. (Chemical Research in Toxicology) Predicting organ toxicity using in vitro bioactivity data and chemical structure. CHEMICAL RESEARCH IN TOXICOLOGY. American Chemical Society, Washington, DC, USA, 30: 2046−2059, (2017).
데이터 정보
연관 데이터
Predict Organ Toxicity ChemResTox Data
공공데이터포털
We use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performnce was assessed based on F1 scores using five-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%) and these gains were correlated (ρ= 0.92) with the number of chemicals. This dataset is associated with the following publication: Liu, J., G. Patlewicz, A. Williams, R. Thomas, and I. Shah. (Chemical Research in Toxicology) Predicting organ toxicity using in vitro bioactivity data and chemical structure. CHEMICAL RESEARCH IN TOXICOLOGY. American Chemical Society, Washington, DC, USA, 30: 2046−2059, (2017).
The Tox21 10K Compound Library: Part 1 - Collaborative chemistry advancing toxicology
공공데이터포털
Table S1: Tox21 IDs mapped to NCGC IDs, PubChem IDs, and DSSTox IDs, and indicating NCATS, NTP and EPA partner library associations (date stamped February 24, 2020). Table S2: DSSTox TOX21SL list of substance IDs and structure formula, molecular weight, SMILES, InChI, and QSAR-ready SMILES (downloaded January 24, 2020). Table S3: DSSTox TOX21SL DTXSID overlaps with EPA CompTox Dashboard lists (downloaded January 24, 2020). Table S4: Predicted physicochemical properties and toxicities generated from OPERA, T.E.S.T, CORINA, and Derek Nexus models. Table S5: ToxPrint (V2.0_r711) fingerprint file for the TOX21SL chemical list. Table S6: Chemotype enrichment workflow results generated from binarized activity hit calls for ToxCast and Tox21 assay end points (aeids) obtained from EPA’s public ToxCast database, invitroDBv2. Table S7: Tox21 binarized assay hit call matrix for stereo and salt pairs, extracted from EPA’s public ToxCast database, invitroDBv3. This dataset is associated with the following publication: Richard, A., R. Huang, S. Waidyanatha, P. Shinn, B.J. Collins, I. Thillainadarajah, C. Grulke, A. Williams, R. Lougee, R. Judson, K. Houck, M.A. Shobair, C. Yang, J.F. Rathman, A. Yasgar, S.C. Fitzpatrick, A. Simeonov, R. Thomas, K.M. Crofton, R.S. Paules, J.R. Bucher, C.P. Austin, R.J. Kavlock, and R.R. Tice. The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology. CHEMICAL RESEARCH IN TOXICOLOGY. American Chemical Society, Washington, DC, USA, 34(2): 189-216, (2021).
The Tox21 10K Compound Library: Part 1 - Collaborative chemistry advancing toxicology
공공데이터포털
Table S1: Tox21 IDs mapped to NCGC IDs, PubChem IDs, and DSSTox IDs, and indicating NCATS, NTP and EPA partner library associations (date stamped February 24, 2020). Table S2: DSSTox TOX21SL list of substance IDs and structure formula, molecular weight, SMILES, InChI, and QSAR-ready SMILES (downloaded January 24, 2020). Table S3: DSSTox TOX21SL DTXSID overlaps with EPA CompTox Dashboard lists (downloaded January 24, 2020). Table S4: Predicted physicochemical properties and toxicities generated from OPERA, T.E.S.T, CORINA, and Derek Nexus models. Table S5: ToxPrint (V2.0_r711) fingerprint file for the TOX21SL chemical list. Table S6: Chemotype enrichment workflow results generated from binarized activity hit calls for ToxCast and Tox21 assay end points (aeids) obtained from EPA’s public ToxCast database, invitroDBv2. Table S7: Tox21 binarized assay hit call matrix for stereo and salt pairs, extracted from EPA’s public ToxCast database, invitroDBv3. This dataset is associated with the following publication: Richard, A., R. Huang, S. Waidyanatha, P. Shinn, B.J. Collins, I. Thillainadarajah, C. Grulke, A. Williams, R. Lougee, R. Judson, K. Houck, M.A. Shobair, C. Yang, J.F. Rathman, A. Yasgar, S.C. Fitzpatrick, A. Simeonov, R. Thomas, K.M. Crofton, R.S. Paules, J.R. Bucher, C.P. Austin, R.J. Kavlock, and R.R. Tice. The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology. CHEMICAL RESEARCH IN TOXICOLOGY. American Chemical Society, Washington, DC, USA, 34(2): 189-216, (2021).
Predicting Systemic Toxicity Effects ArchTox 2017 Data
공공데이터포털
In an effort to address a major challenge in chemical safety assessment, alternative approaches for characterizing systemic effect levels, a predictive model was developed. Systemic effect levels were curated from ToxRefDB, HESS-DB and COSMOS-DB from numerous study types totaling 4382 in vivo studies for 1201 chemicals. Observed systemic effects in mammalian models are a complex function of chemical dynamics, kinetics, and inter- and intra-individual variability. In order to address the complexity problem, systemic effect levels were modeled at the study-level by leveraging study covariates (e.g., study type, strain, administration route) in addition to multiple descriptor sets, including chemical (ToxPrint, PaDEL, and Physchem), biological (ToxCast), and kinetic descriptors. Using Random Forest modeling with cross-validation and external validation procedures, study-level covariates alone accounted for approximately 20% of the variance reducing the root mean squared error (RMSE) from 0.96 log10 mg/kg/day to 0.85 log10 mg/kg/day, providing a baseline performance metric (lower expectation of model performance). A consensus model developed using a combination of study-level covariates, chemical, biological, and kinetic descriptors explained a total of 38% of the variance with an RMSE of 0.76 log10 mg/kg/day. A benchmark model (upper expectation of model performance) was also developed with an RMSE of 0.5 log10 mg/kg/day by incorporating study-level covariates and the mean effect level per chemical. To achieve a representative chemical-level prediction, the minimum study-level predicted and observed effect level per chemical were compared reducing the RMSE from 1.1 to 0.8 log10 mg/kg/day. Although biological descriptors did not improve model performance, the final model was enriched for biological descriptors that indicated xenobiotic metabolism gene expression, oxidative stress, and cytotoxicity, demonstrating the importance of accounting for kinetics and non-specific bioactivity in predicting systemic effect levels. Herein, we have generated an externally predictive model of systemic effect levels for use as a safety assessment tool and have generated forward predictions for thousands of chemicals. This dataset is associated with the following publication: Truong, L., G. Ouedraogo, L. Pham, J. Clouzeau, S. Loisel-Joubert, D. Blanchet, H. Noçairi, W. Setzer, R. Judson, C. Grulke, K. Mansouri, and M. Martin. (Archives of Toxicology) Predicting In Vivo Effect Levels for Repeat Dose Systemic Toxicity using Chemical, Biological, Kinetic and Study Covariates. Archives of Toxicology. Springer, New York, NY, USA, 92(2): 587-600, (2018).
Predicting Systemic Toxicity Effects ArchTox 2017 Data
공공데이터포털
In an effort to address a major challenge in chemical safety assessment, alternative approaches for characterizing systemic effect levels, a predictive model was developed. Systemic effect levels were curated from ToxRefDB, HESS-DB and COSMOS-DB from numerous study types totaling 4382 in vivo studies for 1201 chemicals. Observed systemic effects in mammalian models are a complex function of chemical dynamics, kinetics, and inter- and intra-individual variability. In order to address the complexity problem, systemic effect levels were modeled at the study-level by leveraging study covariates (e.g., study type, strain, administration route) in addition to multiple descriptor sets, including chemical (ToxPrint, PaDEL, and Physchem), biological (ToxCast), and kinetic descriptors. Using Random Forest modeling with cross-validation and external validation procedures, study-level covariates alone accounted for approximately 20% of the variance reducing the root mean squared error (RMSE) from 0.96 log10 mg/kg/day to 0.85 log10 mg/kg/day, providing a baseline performance metric (lower expectation of model performance). A consensus model developed using a combination of study-level covariates, chemical, biological, and kinetic descriptors explained a total of 38% of the variance with an RMSE of 0.76 log10 mg/kg/day. A benchmark model (upper expectation of model performance) was also developed with an RMSE of 0.5 log10 mg/kg/day by incorporating study-level covariates and the mean effect level per chemical. To achieve a representative chemical-level prediction, the minimum study-level predicted and observed effect level per chemical were compared reducing the RMSE from 1.1 to 0.8 log10 mg/kg/day. Although biological descriptors did not improve model performance, the final model was enriched for biological descriptors that indicated xenobiotic metabolism gene expression, oxidative stress, and cytotoxicity, demonstrating the importance of accounting for kinetics and non-specific bioactivity in predicting systemic effect levels. Herein, we have generated an externally predictive model of systemic effect levels for use as a safety assessment tool and have generated forward predictions for thousands of chemicals. This dataset is associated with the following publication: Truong, L., G. Ouedraogo, L. Pham, J. Clouzeau, S. Loisel-Joubert, D. Blanchet, H. Noçairi, W. Setzer, R. Judson, C. Grulke, K. Mansouri, and M. Martin. (Archives of Toxicology) Predicting In Vivo Effect Levels for Repeat Dose Systemic Toxicity using Chemical, Biological, Kinetic and Study Covariates. Archives of Toxicology. Springer, New York, NY, USA, 92(2): 587-600, (2018).
ToxCast bioactivity data for p,p'-DDD and analogues
공공데이터포털
Bioactivity data for p,p'-DDD and analogues from ToxCast assays conducted in liver cells were sourced from the EPA’s CompTox Chemistry Dashboard. The links also provide access to the ToxCast assay information and annotation data user guide. This dataset is associated with the following publication: Lizarraga, L., J. Dean, J. Kaiser, S. Wesselkamper, J. Lambert, and J. Zhao. A Case Study on the Application of An Expert-driven Read-Across Approach in Support of Quantitative Risk Assessment of p,p’-Dichlorodiphenyldichloroethane. REGULATORY TOXICOLOGY AND PHARMACOLOGY. Elsevier Science Ltd, New York, NY, USA, 103: 301-313, (2019).
ToxCast bioactivity data for p,p'-DDD and analogues
공공데이터포털
Bioactivity data for p,p'-DDD and analogues from ToxCast assays conducted in liver cells were sourced from the EPA’s CompTox Chemistry Dashboard. The links also provide access to the ToxCast assay information and annotation data user guide. This dataset is associated with the following publication: Lizarraga, L., J. Dean, J. Kaiser, S. Wesselkamper, J. Lambert, and J. Zhao. A Case Study on the Application of An Expert-driven Read-Across Approach in Support of Quantitative Risk Assessment of p,p’-Dichlorodiphenyldichloroethane. REGULATORY TOXICOLOGY AND PHARMACOLOGY. Elsevier Science Ltd, New York, NY, USA, 103: 301-313, (2019).
ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses
공공데이터포털
ToxRefDB comprises information from over fifty years of in vivo toxicity data. The database includes information for over 1000 chemicals, and is being used as a primary source of data for evaluating efforts of the ToxCast program [4,5], as well as for numerous predictive and retrospective analyses. This dataset is associated with the following publication: Watford, S., L. Pham, J. Wignall, R. Shin, M.T. Martin, and K. Friedman. ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses. REPRODUCTIVE TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 89: 145-158, (2019).
ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses
공공데이터포털
ToxRefDB comprises information from over fifty years of in vivo toxicity data. The database includes information for over 1000 chemicals, and is being used as a primary source of data for evaluating efforts of the ToxCast program [4,5], as well as for numerous predictive and retrospective analyses. This dataset is associated with the following publication: Watford, S., L. Pham, J. Wignall, R. Shin, M.T. Martin, and K. Friedman. ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses. REPRODUCTIVE TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 89: 145-158, (2019).
Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset
공공데이터포털
In this study, a major effort was undertaken to compile a large genotoxicity dataset (54,805 records for 9299 substances) from several public sources (e.g., TOXNET, COSMOS, eChemPortal). The names and outcomes of the different assays were harmonized, and assays were annotated by type: gene mutation in Salmonella bacteria (Ames assay) and chromosome mutation (clastogenicity) in vitro or in vivo (chromosome aberration, micronucleus, and mouse lymphoma Tk+/- assays). This dataset was then evaluated to assess genotoxic potential using a categorization scheme, whereby a substance was considered genotoxic if it was positive in at least one Ames or clastogen study. The categorization dataset comprised 8442 chemicals, of which 2728 chemicals were genotoxic, 5585 were not and 129 were inconclusive. QSAR models (TEST and VEGA) and the OECD Toolbox structural alerts/profilers (e.g., OASIS DNA alerts for Ames and chromosomal aberrations) were used to make in silico predictions of genotoxicity potential. The performance of the individual QSAR tools and structural alerts resulted in balanced accuracies of 57-73%. A Naïve Bayes consensus model was developed using combinations of QSAR models and structural alert predictions. The ‘best’ consensus model selected had a balanced accuracy of 81.2%, a sensitivity of 87.24% and a specificity of 75.20%. This in silico scheme offers promise as a first step in ranking thousands of substances as part of a prioritization approach for genotoxicity. This dataset is associated with the following publication: Pradeep, P., R. Judson, D. DeMarini, N. Keshava, T. Martin, J. Dean, C. Gibbons, A. Simha, S. Warren, M. Gwinn, and G. Patlewicz. An Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 18: 100167, (2021).