SDNist: Benchmark data and evaluation tools for data synthesizers.
Data and Resources
-
Taxi 2016BIN
-
Taxi 2016 schemaJSON
taxi2016.json
-
Taxi 2020 dataBIN
-
Taxi 2020 schemaJSON
taxi2020.json
-
Taxi dataBIN
-
Taxi schemaJSON
taxi.json
-
DOI Access for SDNist: Benchmark data and...
-
K-marginal report templateBIN
A jinja2 report template to help humans read the k-marginal data
-
Census IL_OH schemaJSON
IL_OH_10Y_PUMS.json
-
Census NY-PA dataBIN
-
Census NY_PA schemaJSON
NY_PA_10Y_PUMS.json
-
SDNist software respository at GithubPython 3.8 module
SDNist: Benchmark data and evaluation tools for synthetic data generators
-
Datasets for 'Census' evaluation in CSV formatCSV
Three compressed CSV files to run the 'Census'-related functions in SDNist.
-
Taxi datasets in CSV formatCSV
Three compressed CSV files to run the 'Taxi'-related functions in SDNist.
-
Census GA_NC_SC dataBIN
-
Census GA_NC_SC schemaJSON
GA_NC_SC_10Y_PUMS.json
-
Census IL-OH dataBIN
| Field | Value |
|---|---|
| accessLevel | public |
| bureauCode | {006:55} |
| catalog_@context | https://project-open-data.cio.gov/v1.1/schema/data.json |
| catalog_conformsTo | https://project-open-data.cio.gov/v1.1/schema |
| catalog_describedBy | https://project-open-data.cio.gov/v1.1/schema/catalog.json |
| identifier | ark:/88434/mds2-2515 |
| issued | 2021-12-28 |
| landingPage | https://data.nist.gov/od/id/mds2-2515 |
| language | {en} |
| license | https://www.nist.gov/open/license |
| modified | 2021-12-06 00:00:00 |
| programCode | {006:045} |
| publisher | National Institute of Standards and Technology |
| resource-type | Dataset |
| source_datajson_identifier | true |
| source_hash | d7cdf4595dc7357337a23e4b2d6f4debc32fa998 |
| source_schema_version | 1.1 |
| theme | {"Information Technology:Artificial Intelligence","Information Technology:Privacy","Public Safety:Public safety communications research"} |
| Groups |
|
| Tags |
|
| isopen | False |
| license_id | other-license-specified |
| license_title | other-license-specified |
| maintainer | Gary Howarth II |
| maintainer_email | gary.howarth@nist.gov |
| metadata_created | 2025-11-22T17:04:07.863256 |
| metadata_modified | 2025-11-22T17:04:07.863260 |
| notes | SDNist is a set of benchmark data and metrics for the evaluation of synthetic data generators on structured tabular data. These benchmarks are distributed as a simple open-source python package to allow standardized and reproducible comparison of synthetic generator models on real world data and use cases. These data and metrics were developed for and vetted through the NIST PSCR Differential Privacy Temporal Map Challenge, where the evaluation tools, k-marginal and Higher Order Conjunction, proved effective in distinguishing competing models in the competition environment.SDNist is available via `pip` install: `pip install sdnist` for Python >=3.6 or on the [USNIST]Github(https://github.com/usnistgov/SDNist/). The sdnist Python module will download data from NIST as necessary, and users are not required to download data manually. |
| num_resources | 17 |
| num_tags | 13 |
| title | SDNist: Benchmark data and evaluation tools for data synthesizers. |