Patent Claims Research Dataset

The Patent Claims Research Dataset contain detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014. The dataset is derived from the Patent Application Publication Full-Text and Patent Grant Full Text files, available at, to which the Office of Chief Economist (OCE) applied a Python algorithm to identify individual claims as well as the dependency relationship between claims. From the parsed claims text, OCE created six data files containing individually-parsed claims, claim-level statistics, and document-level statistics, including newly-developed measures of patent scope.

A document describing the motivation behind and trends of the patent scope measurements is available and can be cited as: Marco, Alan C. and Sarnoff, Joshua D. and deGrazia, Charles, Patent Claims and Patent Scope (October 2016). USPTO Economic Working Paper 2016-04. Available at: SSRN:

The OCE developed these data files for public use and encourage users to identify fixes and improvements. Please provide all feedback to:


Patent Claims Research Dataset Documentation

Data Files

Download full set of 2014 data files [.dta format (11.2 GB)] [.csv format (9.32 GB)]

Download individual data files:

File Name2014
5.45 GB
4.41 GB
821 MB
452 MB
119 MB
90.3 MB
4.21 GB
3.79 GB
570 MB
530 MB
81.6 MB
75 MB

The direct download page is here.

Note: The DTA (Stata dataset) files are saved in the Stata-13 data file format.

Note: The code used to parse the Patent Application Publication Full-Text and Patent Grant Full Text files and generate the datasets below will be made available soon.