Challenging Medically-Relevant Genes Benchmark Set
Data and Resources
-
GIAB FTP Site
NCBI Hosted Genome In A Bottle FTP Site
-
Code Repository
Github repository with code used to generate benchmark sets.
-
Code for Manuscript Analysis Repository
Github repository with code used to generate figures and perform analysis for...
-
Text FileTEXT
README.md
-
Web ResourceBIN
HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.bed
-
Web ResourceGZ
HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.vcf.gz
-
Web ResourceBIN
HG002_CHM13v1.0_CMRG_smallvar_v1.00_draft.vcf.gz.tbi
-
Web ResourceBIN
HG002_CHM13_CMRG_smallvar_v1.00_GRCh38-equiv-regions_draft.bed
-
Web ResourceBIN
HG002v11-align2-CHM13v1.0.dip.bed
-
Web ResourceGZ
HG002v11-align2-CHM13v1.0.dip.vcf.gz
-
Web ResourceBIN
HG002v11-align2-CHM13v1.0.dip.vcf.gz.tbi
-
Web ResourceBIN
HG002v11-align2-CHM13v1.0.hap1.bam
-
Web ResourceBIN
HG002v11-align2-CHM13v1.0.hap1.bam.bai
-
Web ResourceBIN
HG002v11-align2-CHM13v1.0.hap2.bam
-
Web ResourceBIN
HG002v11-align2-CHM13v1.0.hap2.bam.bai
-
Web ResourceBIN
HG002_GRCh37_CMRG_smallvar_v1.00.bed
-
Web ResourceGZ
HG002_GRCh37_CMRG_smallvar_v1.00.vcf.gz
-
Web ResourceBIN
HG002_GRCh37_CMRG_smallvar_v1.00.vcf.gz.tbi
-
Web ResourceBIN
HG002_GRCh37_CMRG_SV_v1.00.bed
-
Web ResourceGZ
HG002_GRCh37_CMRG_SV_v1.00.vcf.gz
-
Web ResourceBIN
HG002_GRCh37_CMRG_SV_v1.00.vcf.gz.tbi
-
Web ResourceBIN
GRCh37_CMRG_benchmark_gene_coordinates.bed
-
Web ResourceBIN
HG002v11-align2-GRCh37.hap2.bam
-
Web ResourceBIN
HG002v11-align2-GRCh37.hap2.bam.bai
-
Web ResourceBIN
HG002_GRCh38_CMRG_smallvar_v1.00.bed
-
Web ResourceGZ
HG002_GRCh38_CMRG_smallvar_v1.00.vcf.gz
-
Web ResourceBIN
HG002v11-align2-GRCh37.dip.bed
-
Web ResourceGZ
HG002v11-align2-GRCh37.dip.vcf.gz
-
Web ResourceBIN
HG002v11-align2-GRCh37.dip.vcf.gz.tbi
-
Web ResourceBIN
HG002v11-align2-GRCh37.hap1.bam
-
Web ResourceBIN
HG002v11-align2-GRCh37.hap1.bam.bai
-
Web ResourceBIN
...
-
Web ResourceBIN
GRCh38_hifiasm_error.bed
-
Web ResourceBIN
GRCh38_mrg_full_gene.bed
-
Web ResourceBIN
...
-
Web ResourceBIN
HG002_GRCh38_CMRG_smallvar_v1.00.vcf.gz.tbi
-
Web ResourceBIN
HG002_GRCh38_CMRG_SV_v1.00.bed
-
Web ResourceGZ
HG002_GRCh38_CMRG_SV_v1.00.vcf.gz
-
Web ResourceBIN
HG002_GRCh38_CMRG_SV_v1.00.vcf.gz.tbi
-
Web ResourceBIN
GRCh38_CMRG_benchmark_gene_coordinates.bed
-
Web ResourceBIN
HG002v11-align2-GRCh38.dip.bed
-
Web ResourceGZ
HG002v11-align2-GRCh38.dip.vcf.gz
-
Web ResourceBIN
HG002v11-align2-GRCh38.dip.vcf.gz.tbi
-
Web ResourceBIN
HG002v11-align2-GRCh38.hap1.bam
-
Web ResourceBIN
HG002v11-align2-GRCh38.hap1.bam.bai
-
Web ResourceBIN
HG002v11-align2-GRCh38.hap2.bam
-
Web ResourceBIN
...
-
Text FileTEXT
...
-
Web ResourceBIN
HG002v11-align2-GRCh38.hap2.bam.bai
-
Web ResourceBIN
chksum.md5
-
Web ResourceBIN
GRCh37_MRG_GAPs.bed
-
Web ResourceBIN
GRCh37_curation_medicalgene_SV_errorsorunsure.bed
-
Web ResourceBIN
...
-
Web ResourceBIN
GRCh37_hifiasm_error.bed
-
Web ResourceBIN
GRCh37_mrg_full_gene.bed
-
Web ResourceBIN
GRCh38_CD4_gaps.bed
-
Web ResourceBIN
GRCh38_CD4_gaps_slop50.bed
-
Web ResourceBIN
GRCh38_MRG_GAPs.bed
-
Web ResourceBIN
GRCh38_curation_medicalgene_SV_errorsorunsure_repeatexpanded.bed
-
Tab Separated Values FileTSV
...
-
Web ResourceGZ
HG002-v0.11.mat.fa.gz
-
Web ResourceGZ
HG002-v0.11.mat.gff.gz
-
Web ResourceGZ
HG002-v0.11.pat.fa.gz
-
Web ResourceGZ
HG002-v0.11.pat.gff.gz
-
DOI Access for Challenging Medically-Relevant...
| Field | Value |
|---|---|
| accessLevel | public |
| bureauCode | {006:55} |
| catalog_@context | https://project-open-data.cio.gov/v1.1/schema/data.json |
| catalog_conformsTo | https://project-open-data.cio.gov/v1.1/schema |
| catalog_describedBy | https://project-open-data.cio.gov/v1.1/schema/catalog.json |
| identifier | ark:/88434/mds2-2475 |
| landingPage | https://data.nist.gov/od/id/mds2-2475 |
| language | {en} |
| license | https://www.nist.gov/open/license |
| modified | 2021-09-29 00:00:00 |
| programCode | {006:045} |
| publisher | National Institute of Standards and Technology |
| references | {https://doi.org/10.1101/2021.06.07.444885,https://doi.org/10.1038/s41592-020-01056-5} |
| resource-type | Dataset |
| source_datajson_identifier | true |
| source_hash | fccb0ba3473527651ea6bee234fda01d467c1c32 |
| source_schema_version | 1.1 |
| theme | {Bioscience:Genomics} |
| Groups |
|
| Tags |
|
| isopen | False |
| license_id | other-license-specified |
| license_title | other-license-specified |
| maintainer | Nathanael David Olson |
| maintainer_email | nathanael.olson@nist.gov |
| metadata_created | 2025-11-22T23:09:35.978125 |
| metadata_modified | 2025-11-22T23:09:35.978130 |
| notes | CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 (https://doi.org/10.1038/s41592-020-01056-5) diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in https://doi.org/10.1101/2021.06.07.444885. |
| num_resources | 65 |
| num_tags | 13 |
| title | Challenging Medically-Relevant Genes Benchmark Set |