Patent Application Publication Data/XML (2001 - Present)

Contains the full text, images/drawings, and complex work units (tables, mathematical expressions, genetic sequence data, and chemical structures) of each patent application publication (non-provisional utility and plant) published weekly (Thursdays) from March 15, 2001 to Present. The file formats are eXtensible Markup Language (XML) in accordance with the U.S. Patent Application Version 1.5; 1.6; 4.0 International Common Element (ICE); 4.1 ICE; 4.2 ICE; 4.3 ICE and 4.4 ICE Document Type Definitions (DTDs). Tables and sequence data are included using CALS markup. Mathematical expressions are included using MATHML markup and external Mathematica Notebook (NB) files. Chemical structures are represented by external CambridgeSoft Corp. ChemDraw (CDX) files and MDL Information Systems (MOL) files. Drawings, mathematical expressions, and chemical structures are also included as external Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression image files. Each weekly file contains approximately 5,000 published patent applications. There can be an optional weekly Supplemental zipfile that contains lengthy genetic sequence listings (anything over 300 pages) or a lengthy tables (anything over 200 pages). Approximately 1.5 GB per week. http://patents.reedtech.com/parbfti.php

Data and Resources

Field Value
accessLevel public
accrualPeriodicity R/P1W
bureauCode {006:51}
catalog_@context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
dataQuality true
describedBy http://www.uspto.gov/products/cis/updates/patents_xml.jsp
describedByType application/xml
identifier EIP-5400P-OL
issued 2001-03-15
license http://creativecommons.org/publicdomain/mark/1.0
modified 2015-03-19
programCode {006:070}
publisher US Patent and Trademark Office, Department of Commerce
references {http://www.uspto.gov/products/cis/updates/patents_xml.jsp}
resource-type Dataset
source_datajson_identifier true
source_hash 7e15c9fcd5567d0d7a626521af2f13d1d6434e67
source_schema_version 1.1
temporal 2001-03-15/2015
theme {"Patent Application Publications"}
Groups
  • AmeriGEOSS
  • National Provider
  • North America
Tags
  • amerigeo
  • amerigeoss
  • application
  • cals-markup
  • chemical-structures
  • ckan
  • complex
  • external-cambridgesoft-corp-chemdraw-cdx-files
  • genetic-sequence-data
  • geo
  • geoss
  • images
  • mathematica-notebook-nb-files
  • mathematical-expressions
  • mathml-markup
  • mdl-information-systems-mol-files
  • national
  • north-america
  • patent
  • tables
  • tiff
  • united-states
  • uspto
  • xml
isopen False
license_id other-license-specified
license_title other-license-specified
maintainer Christopher Leithiser
maintainer_email Chris.Leithiser@uspto.gov
metadata_created 2025-11-20T11:59:00.792815
metadata_modified 2025-11-20T11:59:00.792820
notes Contains the full text, images/drawings, and complex work units (tables, mathematical expressions, genetic sequence data, and chemical structures) of each patent application publication (non-provisional utility and plant) published weekly (Thursdays) from March 15, 2001 to Present. The file formats are eXtensible Markup Language (XML) in accordance with the U.S. Patent Application Version 1.5; 1.6; 4.0 International Common Element (ICE); 4.1 ICE; 4.2 ICE; 4.3 ICE and 4.4 ICE Document Type Definitions (DTDs). Tables and sequence data are included using CALS markup. Mathematical expressions are included using MATHML markup and external Mathematica Notebook (NB) files. Chemical structures are represented by external CambridgeSoft Corp. ChemDraw (CDX) files and MDL Information Systems (MOL) files. Drawings, mathematical expressions, and chemical structures are also included as external Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression image files. Each weekly file contains approximately 5,000 published patent applications. There can be an optional weekly Supplemental zipfile that contains lengthy genetic sequence listings (anything over 300 pages) or a lengthy tables (anything over 200 pages). Approximately 1.5 GB per week. http://patents.reedtech.com/parbfti.php
num_resources 1
num_tags 24
title Patent Application Publication Data/XML (2001 - Present)