Cacao Genome Database

Not only is cacao the basic ingredient in the world’s favorite confection, chocolate, but it provides a livelihood for over 6.5 million farmers in Africa, South America and Asia and ranks as one of the top ten agriculture commodities in the world. Historically, cocoa production has been plagued by serious losses due to pests and diseases. The release of the cacao genome sequence will provide researchers with access to the latest genomic tools, enabling more efficient research and accelerating the breeding process, thereby expediting the release of superior cacao cultivars. The sequenced genotype, Matina 1-6, is representative of the genetic background most commonly found in the cacao producing countries, enabling results to be applied immediately and broadly to current commercial cultivars.  Matina 1-6 is highly homozygous which greatly reduces the complexity of the sequence assembly process. While the sequence provided is a preliminary release, it already covers 92% of the genome, with approximately 35,000 genes. We will continue to refine the assembly and annotation, working toward a complete finished sequence. Updates will be made available via the main project website.

Data and Resources

Field Value
accessLevel public
bureauCode {005:18}
catalog_@context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier 035e14e3-e69c-4e9f-a734-2247b19d6cd6
license https://creativecommons.org/publicdomain/zero/1.0/
modified 2019-08-05
programCode {005:040}
publisher Agricultural Research Service
resource-type Dataset
source_datajson_identifier true
source_hash c174176f52c29c8790b39430bc9ae58f17ff3079
source_schema_version 1.1
Groups
  • AmeriGEOSS
  • National Provider
  • North America
Tags
  • amerigeo
  • amerigeoss
  • ckan
  • geo
  • geoss
  • national
  • north-america
  • np301
  • united-states
isopen True
license_id cc-zero
license_title Creative Commons CCZero
license_url http://www.opendefinition.org/licenses/cc-zero
maintainer Gutierrez, Osman
maintainer_email osman.gutierrez@ars.usda.gov
metadata_created 2025-11-20T21:52:11.890693
metadata_modified 2025-11-20T21:52:11.890697
notes <p>Not only is cacao the basic ingredient in the world’s favorite confection, chocolate, but it provides a livelihood for over 6.5 million farmers in Africa, South America and Asia and ranks as one of the top ten agriculture commodities in the world. Historically, cocoa production has been plagued by serious losses due to pests and diseases. The release of the cacao genome sequence will provide researchers with access to the latest genomic tools, enabling more efficient research and accelerating the breeding process, thereby expediting the release of superior cacao cultivars. The sequenced genotype, Matina 1-6, is representative of the genetic background most commonly found in the cacao producing countries, enabling results to be applied immediately and broadly to current commercial cultivars.  Matina 1-6 is highly homozygous which greatly reduces the complexity of the sequence assembly process. While the sequence provided is a preliminary release, it already covers 92% of the genome, with approximately 35,000 genes. We will continue to refine the assembly and annotation, working toward a complete finished sequence. Updates will be made available via the main project website.</p>
num_resources 1
num_tags 9
title Cacao Genome Database