Predict_Organ_Toxicity_ChemResTox_Data

We use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performnce was assessed based on F1 scores using five-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%) and these gains were correlated (ρ= 0.92) with the number of chemicals.

This dataset is associated with the following publication: Liu, J., G. Patlewicz, A. Williams, R. Thomas, and I. Shah. (Chemical Research in Toxicology) Predicting organ toxicity using in vitro bioactivity data and chemical structure. CHEMICAL RESEARCH IN TOXICOLOGY. American Chemical Society, Washington, DC, USA, 30: 2046−2059, (2017).

Data and Resources

Field Value
accessLevel public
bureauCode {020:00}
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
identifier https://doi.org/10.23719/1407008
license https://pasteur.epa.gov/license/sciencehub-license.html
modified 2017-09-28
programCode {020:095}
publisher U.S. EPA Office of Research and Development (ORD)
publisher_hierarchy U.S. Government > U.S. Environmental Protection Agency > U.S. EPA Office of Research and Development (ORD)
references {https://doi.org/10.1021/acs.chemrestox.7b00084}
resource-type Dataset
source_datajson_identifier true
source_hash bcc4bd45587e4e7a387a983436fe09e4357d47a5
source_schema_version 1.1
Groups
  • AmeriGEOSS
  • National Provider
  • North America
Tags
  • amerigeo
  • amerigeoss
  • bioactivity
  • chemotypes
  • ckan
  • geo
  • geoss
  • high-throughput-screening
  • high-throughput-toxicology
  • machine-learning
  • national
  • north-america
  • qsar
  • toxcast
  • toxrefdb
  • united-states
isopen False
license_id other-license-specified
license_title other-license-specified
maintainer Keith Houck
maintainer_email houck.keith@epa.gov
metadata_created 2025-11-22T17:22:44.677106
metadata_modified 2025-11-22T17:22:44.677110
notes We use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performnce was assessed based on F1 scores using five-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%) and these gains were correlated (ρ= 0.92) with the number of chemicals. This dataset is associated with the following publication: Liu, J., G. Patlewicz, A. Williams, R. Thomas, and I. Shah. (Chemical Research in Toxicology) Predicting organ toxicity using in vitro bioactivity data and chemical structure. CHEMICAL RESEARCH IN TOXICOLOGY. American Chemical Society, Washington, DC, USA, 30: 2046−2059, (2017).
num_resources 1
num_tags 16
title Predict_Organ_Toxicity_ChemResTox_Data