데이터셋 상세
미국
data for aromatase 3D qsar modeling
computational chemistry data (very complex; need to be an expert to understand and use. This dataset is associated with the following publication: Lee, S., and M. Barron. 3D-QSAR Study of Steroidal and Azaheterocyclic Human Aromatase Inhibitors using Quantitative Profile of Protein-Ligand Interactions. Journal of Cheminformatics. Springer, New York, NY, USA, 10(2): 1-13, (2018).
연관 데이터
data for aromatase 3D qsar modeling
공공데이터포털
computational chemistry data (very complex; need to be an expert to understand and use. This dataset is associated with the following publication: Lee, S., and M. Barron. 3D-QSAR Study of Steroidal and Azaheterocyclic Human Aromatase Inhibitors using Quantitative Profile of Protein-Ligand Interactions. Journal of Cheminformatics. Springer, New York, NY, USA, 10(2): 1-13, (2018).
Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors
공공데이터포털
Additional details used in the methods are found in the MS Word file “S1_Dawson et al._Supporting_Information.docx”. The MS Excel file “S2_Dawson et al. Supporting Information.xlsx” contains datasets and graphical results. The Excel file sheets are as follows: S2.1 illustrates Clint hepatic flow calculations, S2.2 - 5 include training and test data sets; S2.6-7 include figures illustrating Clint model selection criteria and assemblages of model descriptors; S2.8 includes confusion matrices for evaluation Clint model, S2.9-10 include figures illustrating fup model selection criteria and assemblages of model descriptors (with ranges); S2.11 includes tables of model assessments of the Clint test set, S2.12 includes information relevant to BER calculations for the ToxCast test set, S2.13 includes information relevant to BER calculations for Tox21 chemicals, and S2.14 provides information on different transformations for fup. This dataset is associated with the following publication: Dawson, D., B. Ingle, K. Phillips, J. Nichols, J. Wambaugh, and R. Tornero-Velez. Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 55(9): 6505, (6517).
Designing QSARs for parameters of high throughput toxicokinetic models using open-source descriptors
공공데이터포털
The MS Excel file (Dawson et al S2 Supporting information.xlsx) contains multiple sheets containing the training sets, test sets, and predictions for intrinsic metabolic clearance (Clint), fraction unbound in plasma (fup), and bioactivity-exposure ratios (BER), for ToxCast and pharmaceutical-like chemicals. The Word file (Dawson et al S1 Supporting Information.docx) provides additional supporting information on assembly of the training and test sets for Clint, fup, and BER. The data dictionary describes the terms used in the supporting information, S1 and S2. This dataset is associated with the following publication: Dawson, D., B. Ingle, K. Phillips, J. Nichols, J. Wambaugh, and R. Tornero-Velez. Designing QSARs for Parameters of High-Throughput Toxicokinetic Models Using Open-Source Descriptors. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 55(9): 6505-6517, (2021).
QSARs for Plasma Protein Binding: Source Data and Predictions
공공데이터포털
The dataset has all of the information used to create and evaluate 3 independent QSAR models for the fraction of a chemical unbound by plasma protein (Fub) for environmentally relevant chemicals. In vitro plasma protein values for 1245 pharmaceuticals and 406 ToxCast chemicals were collected from the literature (Obach 2008, Zhu 2013, Wetmore 2012, Wetmore 2015). The 21 descriptors calculated by MOE that were used in the models are included, as is an acid/base/neutral/zwitterions classification based on ionization percentages calculated in ADMET Predictor. Finally, the dataset includes the in silico Fub predictions for each chemical from the constructed k-nearest neighbor, support vector machine, and random forest QSAR models, as well as a consensus (average) prediction. This dataset is associated with the following publication: Ingle, B., R. Tornero-Velez, J. Nichols, and B. Veber. Informing the Human Plasma Protein Binding of Environmental Chemicals by Machine Learning in the Pharmaceutical Space: Applicability Domain and Limits of Predictability. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, USA, 56(11): 2243-2252, (2016).
Quantitative structure activity relationships (QSARs) and machine learning models for abiotic reduction of organic compounds by an aqueous Fe(II) complex
공공데이터포털
Due to the increasing diversity of organic contaminants discharged into anoxic water environments, reactivity prediction is necessary for chemical persistence evaluation for water treatment and risk assessment purposes. Almost all quantitative structure activity relationships (QSARs) that describe rates of contaminant transformation apply only to narrowly-defined, relatively homogenous families of reactants (e.g., dechlorination of alkyl halides). In this work, we develop predictive models for abiotic reduction of 60 organic compounds with diverse reducible functional groups, including nitroaromatic compounds (NACs), aliphatic nitro-compounds (ANCs), aromatic N-oxides (ANOs), isoxazoles (ISXs), polyhalogenated alkanes (PHAs), sulfoxides and sulfones (SOs), and others. Rate constants for their reduction were measured using a model reductant system, Fe(II)-tiron. Qualitatively, the rates followed the order NACs > ANOs  ISXs  PHAs > ANCs > SOs. To develop QSARs, both conventional chemical descriptor-based and machine learning (ML)-based approaches were investigated. Conventional univariate QSARs based on a molecular descriptor ELUMO (energy of the lowest-unoccupied molecular orbital) gave good correlations within classes. Multivariate QSARs combining ELUMO with Abraham descriptors for physico-chemical properties gave slightly improved correlations within classes for NCs and NACs, but little improvement in correlation within other classes or among classes. The ML model obtained covers reduction rates for all classes of compounds and all of the conditions studied with the prediction accuracy similar to those of the conventional QSARs for individual classes (r2 = 0.41-0.98 for univariate QSARs, 0.71-0.94 for multivariate QSARs, and 0.83 for the ML model). Both approaches required a scheme for a priori classification of the compounds for model training. This work offers two alternative modelling approaches to comprehensive abiotic reactivity prediction for persistence evaluation of organic compounds in anoxic water environments. This dataset is associated with the following publication: Gao, Y., S. Zhong, T. Torralba-Sanchez, P. Tratnyek, E. Weber, Y. Chen, and H. Zhang. Quantitative structure activity relationships (QSARs) and machine learning models for abiotic reduction of organic compounds by an aqueous Fe(II) complex. WATER RESEARCH. Elsevier Science Ltd, New York, NY, USA, 192: 116843, (2021).
Rapid Experimental Estimates of Physicochemical Properties
공공데이터포털
We have performed high-throughput experimental estimates of five physicochemical properties for a set of 200 chemicals to evaluate the consistency with previous measurements, factors impacting consistency and experimental success, and the applicability domain of the new data in relation to previously measured data and predictive models. This dataset is associated with the following publication: Nicolas, C., K. Mansouri, K. Phillips, C. Grulke, A. Richard, A. Williams, J. Rabinowitz, K. Isaacs, A. Yau, and J. Wambaugh. (ENVIRONMENTAL SCIENCE and TECHNOLOGY) Rapid Experimental Estimates of Physicochemical Properties to Inform Models and Testing. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 636: 901-909, (2018).
A Workflow for Identifying Metabolically Active Chemicals to Complement in vitro Toxicity Screening
공공데이터포털
This data includes metabolite predictions for in vitro inactive chemicals, predictions of those metabolite's estrogen receptor binding activity, in vitro and in silico information regarding parent compound binding activities, linking of metabolite structures and routes to parent compounds, and estimates of binding activity obtained from literature when possible. This dataset is associated with the following publication: Leonard, J., C. Stevens, K. Mansouri, D. Chang, H. Pudukodu, S. Smith, and C. Tan. A Workflow for Identifying Metabolically Active Chemicals to Complement in vitro Toxicity Screening. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 6: 71-83, (2018).
Judson Mansouri Automated Chemical Curation QSAREnvRes Data
공공데이터포털
Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publically available PHYSPROP physico-chemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers, and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest quality subset of the original dataset was compared to the larger curated and corrected data set. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publically available for further usage and integration by the scientific community. This dataset is associated with the following publication: Mansouri, K., C. Grulke, A. Richard, R. Judson, and A. Williams. (SAR AND QSAR IN ENVIRONMENTAL RESEARCH) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modeling. SAR AND QSAR IN ENVIRONMENTAL RESEARCH. Taylor & Francis, Inc., Philadelphia, PA, USA, 27(11): 911-937, (2016).
Evaluating structure-based activity in a high-throughput assay for steroid biosynthesis
공공데이터포털
Dataset from Foster, M.J., et al., Evaluating structure-based activity in a high-throughput assay for steroid biosynthesis, Computational Toxicology, Vol 24, No. 100245, Nov 2022, DOI https://doi.org/10.1016/j.comtox.2022.100245 The work described herein was conducted in R software (version 4.0.2) and is available at GitHub (https://github.com/USEPA/CompTox-HTH295R-SAR) and the US EPA Clowder repository as well as in Supplemental File 1 as an html file including code and resultant output. This dataset is associated with the following publication: Foster, M., G. Patlewicz, I. Shah, D. Haggard, R. Judson, and K. Friedman. Evaluating structure-based activity in a high-throughput assay for steroid biosynthesis. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 24: 100245, (2022).
Metadata Files for Structure-based QSAR models to predict repeat dose toxicity points of departure
공공데이터포털
This paper describes a model to take chemical structures and predict a property (the point of departure) for a new chemical. No new data were generated. The contents of this zip file contains metadata that you could use to make a model prediction. It does contain all of the code and a help file describing how to run the model. This dataset is associated with the following publication: Pradeep, P., K. Paul-Friedman, and R. Judson. Structure-based QSAR Models to Predict Repeat Dose Toxicity Points of Departure. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 16(November 2020): 100139, (2020).