BUTTER - Empirical Deep Learning Dataset

The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

Data e Risorse

Campo Valore
DOI 10.25984/1872441
accessLevel public
bureauCode {019:20}
catalog_@context https://openei.org/data.json
catalog_@id https://openei.org/data.json
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
dataQuality true
identifier https://data.openei.org/submissions/5708
issued 2022-05-20T06:00:00Z
landingPage https://data.openei.org/submissions/5708
license https://creativecommons.org/licenses/by/4.0/
modified 2022-06-15T21:08:28Z
old-spatial {"type":"Polygon","coordinates":[[[-180,-83],[180,-83],[180,83],[-180,83],[-180,-83]]]}
programCode {019:023}
projectNumber GO0028308
projectTitle National Renewable Energy Laboratory (NREL) Lab Directed Research and Development (LDRD)
publisher National Renewable Energy Laboratory
resource-type Dataset
source_datajson_identifier true
source_hash 806470c3ea21696ef21cde9a408c32c3f3269274
source_schema_version 1.1
spatial {"type":"Polygon","coordinates":[[[-180,-83],[180,-83],[180,83],[-180,83],[-180,-83]]]}
Gruppi
  • AmeriGEOSS
  • National Provider
  • North America
Tag
  • amerigeo
  • amerigeoss
  • batch-size
  • benchmark
  • ckan
  • deep-learning
  • depth
  • empirical
  • empirical-deep-learning
  • empirical-machine-learning
  • epoch
  • geo
  • geoss
  • label-noise
  • learning-rate
  • machine-learning
  • minibatch-size
  • national
  • network-shape
  • network-topology
  • neural-architecture-search
  • neural-networks
  • north-america
  • regularization
  • shape
  • topology
  • training
  • training-epoch
  • united-states
isopen True
license_id cc-by
license_title Creative Commons Attribution
license_url http://www.opendefinition.org/licenses/cc-by
maintainer Charles Edison Tripp
maintainer_email charles.tripp@nrel.gov
metadata_created 2025-11-19T20:20:04.917312
metadata_modified 2025-11-19T20:20:04.917318
notes The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.
num_resources 3
num_tags 29
title BUTTER - Empirical Deep Learning Dataset