Bulk Data Product FAQs
-- BACKGROUND --
1. How are the USPTO data products organized?
USPTO electronic data products are generally organized by patents and trademarks and by issue date or publication date.
Patent products include patent grants and patent application publications in image, text, text and image, and bibliographic forms (some in XML); and additional information such as patent assignments, maintenance fee events, etc.
Trademark products include registration images, application text, assignment text, and Trademark Trial and Appeal Board (TTAB) text.
2. What is the intent of publishing patent and trademark information in bulk format?
The bulk format allows users to obtain data sets in bulk rather than per patent or trademark application or registration. These data sets are likely to be of interest to researchers, commercial vendors, academics and consultants, and not to most members of the public. If you just want to look up an individual patent or trademark, you can get that information without using bulk data. USPTO data is viewable through various search interfaces on the USPTO Web site, as well as through some free and commercial Web sites. To search the patent databases, see Search for Patents. For pending and registered trademarks, see Trademark Electronic Search System (TESS). Additional search options are available from the USPTO Home page.
3. Why is there a fee for some data products?
The USPTO plans to eventually provide all data products online at no charge. Most data products are already available from USPTO for no charge. A few products are available from USPTO for a fee, either because they are provided on physical media or because of bandwidth considerations. USPTO has made these products alternatively available online and at no charge from Google.
4. How up-to-date are the datasets?
Data products obtained directly from USPTO are available on date of publication.
5. How large are the bulk data sets?
Individual bulk data files generally range in size from a few Megabytes to several Gigabytes. Collections of data can be several Terabytes.
6. Are all types of patents included in the bulk data sets?
Patents data sets include:
- Design Patents
- Plant Patents
- Reexamination Certificates (available only in Patent Grant Image files)
- Reissue Patents
- Statutory Invention Registration (SIR) documents
- Utility Patents
7. Are there any restrictions on the bulk data sets?
-- XML-FORMATTED DATA SETS --
8. What is Extensible Markup Language (XML)?
XML is a standard way of storing structured data. It is hierarchical and can be applied to many situations (in this case to patent grant and published application information). In general XML files are designed to be used by programmers with specialized tools. For background information, a good reference is the XML article from Wikipedia.
9. How do I view bulk data sets in XML?
The bulk data sets can be accessed by an XML reader. A generic XML reader can extract the XML element structure. In order to perform useful automated processing with the documents, however, a program needs specific knowledge of the XML schema used, which the PTO has documented online.
It is important to understand that the concatenated XML documents in the ZIP files, which have file extension “XML,” are not the same as standard XML files and therefore will not be immediately readable by an ordinary XML parser. Instead, the files must be broken into individual XML documents, by splitting them apart at the XML declarations and/or DOCTYPE declarations.
10. Where can one find documentation on the XML formats used in patent data?
Documentation for patent published applications and grants may be found at the following USPTO web page: http://www.uspto.gov/products/cis/patents_xml.jsp
The documentation includes machine-readable Document Type Definitions (DTD) and human-readable documentation for the XML formats suitable for use by an XML programmer who wishes to extract information from the XML files.
11. What other XML resources are available at the USPTO?
Additional information, such as links to older versions of the documentation, may be found at: http://www.uspto.gov/products/cis/updates/patents_xml.jsp
The USPTO generally does not update old files when it migrates to a new XML version, so users accessing data from several different years may need to use more than one version of DTDs.
12. Who do I contact for additional information?
Questions and suggestions can be directed to email@example.com.