데이터셋 상세
미국
In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning
QSAR Model Reporting Formats. Examples of R code: feature selection and regression analysis. Figure S1: Data distribution of logBCF, BP, MP and logVP. Figures S2–S5: Relationship between model complexity and prediction errors as well as the plots of estimated values versus experimental data for logBCF, BP, MP, and logVP, respectively. Figure S6: Plots of leverage versus standardized residuals for logBCF, BP, MP, and logVP models. Table S1: Chemical product classes for training and test sets. Tables S2–S5: Regression statistics for logBCF, BP, MP, and logVP, respectively. Table S6: Applicability domains for logBCF, BP, MP, and logVP. Tables S7–S12: Chemicals with large prediction residuals for the six properties (PDF) Chemical names, CAS registry number and SMILES as well as experimentally measured and estimated property values of the training and test sets (XLSX). This dataset is associated with the following publication: Zang, Q., K. Mansouri, A. Williams, R. Judson, D. Allen, W.M. Casey, and N.C. Kleinstreuer. (Journal of Chemical Information and Modeling) In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, USA, 57(1): 36-49, (2017).
데이터 정보
연관 데이터
Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors
공공데이터포털
Additional details used in the methods are found in the MS Word file “S1_Dawson et al._Supporting_Information.docx”. The MS Excel file “S2_Dawson et al. Supporting Information.xlsx” contains datasets and graphical results. The Excel file sheets are as follows: S2.1 illustrates Clint hepatic flow calculations, S2.2 - 5 include training and test data sets; S2.6-7 include figures illustrating Clint model selection criteria and assemblages of model descriptors; S2.8 includes confusion matrices for evaluation Clint model, S2.9-10 include figures illustrating fup model selection criteria and assemblages of model descriptors (with ranges); S2.11 includes tables of model assessments of the Clint test set, S2.12 includes information relevant to BER calculations for the ToxCast test set, S2.13 includes information relevant to BER calculations for Tox21 chemicals, and S2.14 provides information on different transformations for fup. This dataset is associated with the following publication: Dawson, D., B. Ingle, K. Phillips, J. Nichols, J. Wambaugh, and R. Tornero-Velez. Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 55(9): 6505, (6517).
Metadata Files for Structure-based QSAR models to predict repeat dose toxicity points of departure
공공데이터포털
This paper describes a model to take chemical structures and predict a property (the point of departure) for a new chemical. No new data were generated. The contents of this zip file contains metadata that you could use to make a model prediction. It does contain all of the code and a help file describing how to run the model. This dataset is associated with the following publication: Pradeep, P., K. Paul-Friedman, and R. Judson. Structure-based QSAR Models to Predict Repeat Dose Toxicity Points of Departure. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 16(November 2020): 100139, (2020).
Quantitative Structure-Use Relationship Model thresholds for Model Validation, Domain of Applicability, and Candidate Alternative Selection
공공데이터포털
This file contains value of the model training set confusion matrix, domain of applicability evaluation based on training set to predicted chemicals structural similarity, and 75th percentile bioactivity index values for each QSUR model. This dataset is associated with the following publication: Phillips, K., J. Wambaugh, C. Grulke, K. Dionisio, and K. Isaacs. High-throughput screening of chemicals as functional substitutes using structure-based classification models. GREEN CHEMISTRY. Royal Society of Chemistry, Cambridge, UK, 19: 1063-1074, (2017).
Quantitative Structure-Use Relationship Model thresholds for Model Validation, Domain of Applicability, and Candidate Alternative Selection
공공데이터포털
This file contains value of the model training set confusion matrix, domain of applicability evaluation based on training set to predicted chemicals structural similarity, and 75th percentile bioactivity index values for each QSUR model. This dataset is associated with the following publication: Phillips, K., J. Wambaugh, C. Grulke, K. Dionisio, and K. Isaacs. High-throughput screening of chemicals as functional substitutes using structure-based classification models. GREEN CHEMISTRY. Royal Society of Chemistry, Cambridge, UK, 19: 1063-1074, (2017).
Datasets for manuscript "Predicting chemical end-of-life scenarios using structure-based classification models"
공공데이터포털
As described in the README.md file, the GitHub repository github.com/USEPA/PRTR-QSTR-models/tree/data-driven are Python scripts written to run Quantitative Structure–Transfer Relationship (QSTR) models based on chemical structure-based machine learning (ML) models for supporting environmental regulatory decision-making. Using features associated with annual chemical transfer amounts, chemical generator industry sectors, environmental policy stringency, gross value added by industry sectors, chemical descriptors, and chemical unit prices, as in the GitHub repository PRTR_transfers, the QSTR models developed here can predict potential EoL activities for chemicals transferred to off-site locations for EoL management. Also, this contribution shows that QSTR models aid in estimating the mass fraction allocation of chemicals of concern transferred off-site for EoL activities. Also, it describes the Python libraries required for running the code, how to use it, the obtained outputs files after running the Python script, and how to obtain all manuscript figures and results. This dataset is associated with the following publication: Hernandez-Betancur, J.D., G.J. Ruiz-Mercado, and M. Martín. Predicting Chemical End-of-Life Scenarios Using Structure-Based Classification Models. ACS Sustainable Chemistry & Engineering. American Chemical Society, Washington, DC, USA, 11(9): 3594-3602, (2023).
Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis
공공데이터포털
The dataset and experimental and predicted amenability calls are provided in the supplemental file “Supplemental_ToxCast_PhaseII.xlsx”. PaDEL descriptors were generated for each candidate and amenability predictions were calculated using both ESI+ and ESI- downsampled models. The resulting dataset is available in the supplemental file “Supplemental_Application.xlsx”. It should be noted that the dataset used in this demonstration is biased toward environmentally relevant chemicals, many of which appear in a large number of chemical lists on the Dashboard (see the DATA_SOURCES column in “Supplemental_Application.xlsx” for both ESI+ and ESI-). Training and test datasets were constructed using the PaDEL descriptors and the ESI+ and ESI- endpoint values discussed previously. These training and test sets are provided in the supplemental file “Supplemental_train_test.xlsx”. A list of descriptors is provided in the supplemental file “Supplemental_Descriptors.xlsx”. A similar plot (Figure S1) of variable importance for the ESI+ upsampled model, and a similar plot (Figure S2) of variable importance for the ESI- upsampled model can be found in “Supplemental_Figures.docx”. This dataset is associated with the following publication: Lowe, C., K. Isaacs, A. McEachran, C. Grulke, J. Sobus, E. Ulrich, A. Richard, A. Chao, J. Wambaugh, and A. Williams. Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis. Analytical and Bioanalytical Chemistry. Springer, New York, NY, USA, 413(30): 7495-7508, (2021).
Designing QSARs for parameters of high throughput toxicokinetic models using open-source descriptors
공공데이터포털
The MS Excel file (Dawson et al S2 Supporting information.xlsx) contains multiple sheets containing the training sets, test sets, and predictions for intrinsic metabolic clearance (Clint), fraction unbound in plasma (fup), and bioactivity-exposure ratios (BER), for ToxCast and pharmaceutical-like chemicals. The Word file (Dawson et al S1 Supporting Information.docx) provides additional supporting information on assembly of the training and test sets for Clint, fup, and BER. The data dictionary describes the terms used in the supporting information, S1 and S2. This dataset is associated with the following publication: Dawson, D., B. Ingle, K. Phillips, J. Nichols, J. Wambaugh, and R. Tornero-Velez. Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 55(9): 6505-6517, (2021).
Predictive Structure-Based Toxicology Approaches To Assess the Androgenic Potential of Chemicals
공공데이터포털
Chemical structures, RMSD values, docking scores, additional tables and figures, and methodological details (PDF) Additional information concerning the starting data set, EPA-ARDB.csv (CSV) Additional information concerning V1, V1_SI.csv (CSV) Additional information concerning V2, V2_SI.csv (CSV) Additional information concerning V3, V3_SI.csv (CSV). This dataset is associated with the following publication: Trisciuzzi, D., D. Alberga, K. Mansouri, R. Judson, E. Novellino, G.F. Mangiatordi, and O. Nicolotti. Predictive structure-based toxicology approaches to assess the androgenic potential of chemicals. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, USA, 57(11): 2874-2884, (2017).
Predictive Structure-Based Toxicology Approaches To Assess the Androgenic Potential of Chemicals
공공데이터포털
Chemical structures, RMSD values, docking scores, additional tables and figures, and methodological details (PDF) Additional information concerning the starting data set, EPA-ARDB.csv (CSV) Additional information concerning V1, V1_SI.csv (CSV) Additional information concerning V2, V2_SI.csv (CSV) Additional information concerning V3, V3_SI.csv (CSV). This dataset is associated with the following publication: Trisciuzzi, D., D. Alberga, K. Mansouri, R. Judson, E. Novellino, G.F. Mangiatordi, and O. Nicolotti. Predictive structure-based toxicology approaches to assess the androgenic potential of chemicals. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, USA, 57(11): 2874-2884, (2017).
Quantitative Structure-Use Relationship (QSUR) Model Descriptors
공공데이터포털
This data set contains ToxPrint finger prints for all chemicals in FUse that had QSAR-ready SMILES strings as well as select physicochemical properties from the Estimation Program Interface Suite (EPI Suite) program. This dataset is associated with the following publication: Phillips, K., J. Wambaugh, C. Grulke, K. Dionisio, and K. Isaacs. High-throughput screening of chemicals as functional substitutes using structure-based classification models. GREEN CHEMISTRY. Royal Society of Chemistry, Cambridge, UK, 19: 1063-1074, (2017).