Patent Claims Research Dataset

The Patent Claims Research Dataset contain detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014. The dataset is derived from the Patent Application Publication Full-Text and Patent Grant Full Text files, available at https://bulkdata.uspto.gov/, to which the Office of Chief Economist (OCE) applied a Python algorithm to identify individual claims as well as the dependency relationship between claims. From the parsed claims text, OCE created six data files containing individually-parsed claims, claim-level statistics, and document-level statistics, including newly-developed measures of patent scope.

A document describing the motivation behind and trends of the patent scope measurements is available and can be cited as: Marco, Alan C. and Sarnoff, Joshua D. and deGrazia, Charles, Patent Claims and Patent Scope (October 2016). USPTO Economic Working Paper 2016-04. Available at: SSRN: https://ssrn.com/abstract=2844964

The OCE developed these data files for public use and encourage users to identify fixes and improvements. Please provide all feedback to: EconomicsData@uspto.gov

Documentation

Patent Claims Research Dataset Documentation

Data Files

Download full set of 2014 data files [.dta format (11.2 GB)] [.csv format (9.32 GB)]

Download individual data files:

File Name 2014
patent_claims_fulltext DTA
5.45 GB
CSV
4.41 GB
patent_claims_stats DTA
821 MB
CSV
452 MB
patent_document_stats DTA
119 MB
CSV
90.3 MB
pgpub_claims_fulltext DTA
4.21 GB
CSV
3.79 GB
pgpub_claims_stats DTA
570 MB
CSV
530 MB
pgpub_document_stats DTA
81.6 MB
CSV
75 MB

The direct download page is here.

Note: The DTA (Stata dataset) files are saved in the Stata-13 data file format.

Note: The code used to parse the Patent Application Publication Full-Text and Patent Grant Full Text files and generate the datasets below will be made available soon.