Patent Claims Research Dataset

The Patent Claims Research Dataset contain detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014. The dataset is derived from the Patent Application Publication Full-Text and Patent Grant Full Text files, available at https://bulkdata.uspto.gov/, to which the Office of Chief Economist (OCE) applied a Python algorithm to identify individual claims as well as the dependency relationship between claims. From the parsed claims text, OCE created six data files containing individually-parsed claims, claim-level statistics, and document-level statistics, including newly-developed measures of patent scope.

A document describing the motivation behind and trends of the patent scope measurements is available and can be cited as: Marco, Alan C. and Sarnoff, Joshua D. and deGrazia, Charles, Patent Claims and Patent Scope (October 2016). USPTO Economic Working Paper 2016-04. Available at: SSRN: https://ssrn.com/abstract=2844964

For questions, please email EconomicsData@uspto.gov

Documentation

Patent Claims Research Dataset Documentation

Data Files

Download full set of 2014 data files [.dta format (11.2 GB)] [.csv format (9.32 GB)]

Download individual data files:

File Name2014
patent_claims_fulltextDTA
5.45 GB
CSV
4.41 GB
patent_claims_statsDTA
821 MB
CSV
452 MB
patent_document_statsDTA
119 MB
CSV
90.3 MB
pgpub_claims_fulltextDTA
4.21 GB
CSV
3.79 GB
pgpub_claims_statsDTA
570 MB
CSV
530 MB
pgpub_document_statsDTA
81.6 MB
CSV
75 MB

The direct download page is here.

Note: The DTA (Stata dataset) files are saved in the Stata-13 data file format.

Note: The code used to parse the Patent Application Publication Full-Text and Patent Grant Full Text files and generate the datasets below will be made available soon.