Historical Patent Data Files

Patent classification systems are largely designed for administrative purposes, limiting their value for most research purposes. To address this deficiency, Hall, Jaffe, and Trajtenberg (2001) developed a higher-level classification for the National Bureau of Economic Research (NBER) Patent Citation Data File by aggregating U.S. Patent Classification (USPC) classes into economically relevant technology categories. While this NBER classification scheme has proven valuable for researchers investigating US patent grants, comparable information on patent applications remained unavailable. For that reason, the Office of Chief Economist (OCE) developed a probability-matching algorithm to apply NBER classifications to patent applications as well as in-force and expired patents. From matched data, we construct the USPTO Historical Patent Data Files, four research datasets containing time series and micro-level data by NBER sub-category on applications, grants, and in-force patents spanning two centuries of innovation. Our hope is that researchers will make use of these data which, for the first time, enable detailed study of the complex dynamics between new filings, pendency, and abandonment and put into context recent trends in patenting activity, litigation, and technological change.

The USPTO Historical Patent Data Files includes four datasets:

  • The annual dataset contains counts of in-force and issued patents from 1840 to 2014 by NBER sub-category.
  • The monthly file contains a monthly count of applications, issued patents, and in-force patents by application status, disposal type (abandoned, issued, or pending), and NBER sub-category from 1981 to 2014.  
  • The monthly_disposal dataset contains counts of application by disposal type for each monthly application cohort by NBER sub-category from 1981 to 2014.
  • The historical_masterfile contains micro-level application, NBER sub-category, and prosecution data on 2.2 million patent applications filed from 1981 to 2014 and 8.9 million patents issued through 2014.  
  • Three intermediate files (orders, orders_class, and orders_subclass) used to generate the four datasets are also available for download.

A document describing these data is available and can be cited as: Marco, Alan C. and Carley, Michael and Jackson, Steven and Myers, Amanda F., The USPTO Historical Patent Data Files: Two Centuries of Innovation (June 1, 2015). SSRN working paper, available at http://ssrn.com/abstract=2616724

The OCE developed these data files for public use and encourage users to identify fixes and improvements. Please provide all feedback to: EconomicsData@uspto.gov

Data Files

File Name 2014
Output files:
annual DTA
105 KB
117 KB
historical_masterfile DTA
758 MB
228 MB
monthly DTA
280 KB
425 KB
monthly_disposal DTA
22.8 MB
37.7 MB
Intermediate files:
orders DTA
49.8 KB
169 KB
orders_class DTA
75 MB
15.6 MB
orders_subclass DTA
3.96 MB
5.34 MB

Direct download here.

* Note: the files marked with an asterisk have been compressed into a ZIP archive.