Determining the Predictive Limit of QSAR Models

The research done to evaluate how the predictivity of models are effected by error in either the training or the test set is simple to describe conceptually. Benchmark datasets are downloaded from reputable sources. Then the datasets are split into training and test sets. Randomized error is added and then models created on both error laden and native training sets. Those models are used to predict both error laden and native test sets. Differences in standard statistics commonly used to assess predictivity are observed.

This dataset is associated with the following publication: Kolmar, S., and C. Grulke. The Effect of Noise on the Predictive Limit of QSAR Models. Journal of Cheminformatics. Springer, New York, NY, USA, 13: 92, (2021).

Data and Resources

Field Value
accessLevel public
bureauCode {020:00}
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
identifier https://doi.org/10.23719/1524279
license https://pasteur.epa.gov/license/sciencehub-license.html
modified 2021-06-21
programCode {020:000}
publisher U.S. EPA Office of Research and Development (ORD)
publisher_hierarchy U.S. Government > U.S. Environmental Protection Agency > U.S. EPA Office of Research and Development (ORD)
references {https://doi.org/10.1186/s13321-021-00571-7}
resource-type Dataset
source_datajson_identifier true
source_hash 980bb136a083e64c00da59d6bbf83b90c67b3e31
source_schema_version 1.1
Groups
  • AmeriGEOSS
  • National Provider
  • North America
Tags
  • amerigeo
  • amerigeoss
  • ckan
  • error
  • gaussian-process
  • geo
  • geoss
  • model-evaluation
  • national
  • north-america
  • prediction-error
  • united-states
isopen False
license_id other-license-specified
license_title other-license-specified
maintainer Scott Kolmar
maintainer_email kolmar.scott@epa.gov
metadata_created 2025-11-22T19:45:21.458979
metadata_modified 2025-11-22T19:45:21.458983
notes The research done to evaluate how the predictivity of models are effected by error in either the training or the test set is simple to describe conceptually. Benchmark datasets are downloaded from reputable sources. Then the datasets are split into training and test sets. Randomized error is added and then models created on both error laden and native training sets. Those models are used to predict both error laden and native test sets. Differences in standard statistics commonly used to assess predictivity are observed. This dataset is associated with the following publication: Kolmar, S., and C. Grulke. The Effect of Noise on the Predictive Limit of QSAR Models. Journal of Cheminformatics. Springer, New York, NY, USA, 13: 92, (2021).
num_resources 1
num_tags 12
title Determining the Predictive Limit of QSAR Models