NIST Statistical Reference Datasets - SRD 140

The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.

Data e Risorse

DOI Access to NIST Statistical Reference...HTML
DOI Access to NIST Statistical Reference Datasets - SRD 140
Esplora
- Anteprima
- Download

Campo	Valore
accessLevel	public
accrualPeriodicity	irregular
bureauCode	{006:55}
catalog_@context	https://project-open-data.cio.gov/v1.1/schema/data.json
catalog_conformsTo	https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy	https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier	FF429BC178718B3EE0431A570681E858224
landingPage	https://data.nist.gov/od/id/FF429BC178718B3EE0431A570681E858224
language	{en}
license	https://www.nist.gov/open/license
modified	2003-11-20 00:00:00
programCode	{006:052}
publisher	National Institute of Standards and Technology
references	{http://www.itl.nist.gov/div898/strd/general/howto.html,http://www.itl.nist.gov/div898/strd/general/faq.html}
resource-type	Dataset
source_datajson_identifier	true
source_hash	7d8c1919bb71c2594abfb8a621b90a22562974ea
source_schema_version	1.1
theme	{"Standards:Reference data"}
Gruppi	AmeriGEOSS National Provider North America
Tag	AmeriGEO AmeriGEOSS CKAN GEO GEOSS National North America United States algorithms anovas averages bayes-theorems bayesian bayesian-computations bayesian-statistics benchmark-data benchmarks computational-accuracies least-squares linear-regressions mcmc nonlinear-regressions numerical-accuracies numerical-analysis round-off rounding-errors roundings software-evaluations standard-deviations statistical-reference-datasets statistical-software summary-statistics variance-analysis
isopen	False
license_id	other-license-specified
license_title	other-license-specified
maintainer	William F. Guthrie
maintainer_email	william.guthrie@nist.gov
metadata_created	2025-09-23T14:52:35.283715
metadata_modified	2025-09-23T14:52:35.283725
notes	The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.
num_resources	1
num_tags	33
title	NIST Statistical Reference Datasets - SRD 140