Challenging Medically-Relevant Genes Benchmark Set

CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 (https://doi.org/10.1038/s41592-020-01056-5) diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in https://doi.org/10.1101/2021.06.07.444885.

Data and Resources

Field Value
accessLevel public
bureauCode {006:55}
catalog_@context https://project-open-data.cio.gov/v1.1/schema/data.json
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier ark:/88434/mds2-2475
landingPage https://data.nist.gov/od/id/mds2-2475
language {en}
license https://www.nist.gov/open/license
modified 2021-09-29 00:00:00
programCode {006:045}
publisher National Institute of Standards and Technology
references {https://doi.org/10.1101/2021.06.07.444885,https://doi.org/10.1038/s41592-020-01056-5}
resource-type Dataset
source_datajson_identifier true
source_hash fccb0ba3473527651ea6bee234fda01d467c1c32
source_schema_version 1.1
theme {Bioscience:Genomics}
Groups
  • AmeriGEOSS
  • National Provider
  • North America
Tags
  • amerigeo
  • amerigeoss
  • bioinformatics
  • ckan
  • dna-sequencing
  • geo
  • geoss
  • human-genomics
  • medical-genomics
  • national
  • north-america
  • reference-materials
  • united-states
isopen False
license_id other-license-specified
license_title other-license-specified
maintainer Nathanael David Olson
maintainer_email nathanael.olson@nist.gov
metadata_created 2025-11-22T23:09:35.978125
metadata_modified 2025-11-22T23:09:35.978130
notes CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 (https://doi.org/10.1038/s41592-020-01056-5) diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in https://doi.org/10.1101/2021.06.07.444885.
num_resources 65
num_tags 13
title Challenging Medically-Relevant Genes Benchmark Set