Artificial Intelligence Patent Dataset

***The AIPD dataset links are currently disabled. OCE uncovered an issue that affects a small number of phase 2 predictions for documents published in 2019 and 2020. The issue is currently being fixed and OCE anticipates that the data will be updated and be available again by mid-August 2021.***

To assist researchers and policymakers focusing on the determinants and impacts of artificial intelligence (AI) invention, OCE released two data files, collectively called the Artificial Intelligence Patent Dataset (AIPD). The first data file identifies United States (U.S.) patents issued between 1976 and 2020 and pre-grant publications (PGPubs) published through 2020 that contain one or more of several AI technology components (including machine learning, natural language processing, computer vision, speech, knowledge processing, AI hardware, evolutionary computation, and planning and control). OCE generated this data file using a machine learning (ML) approach that analyzed patent text and citations to identify AI in U.S. patent documents (Abood and Feltenberger 2018; Toole et al. 2020). OCE’s approach is based on the methodology of Abood and Feltenberger (2018), but also includes an analysis of patent claims to better identify AI contained in the technical and legal scope of the invention. The second data file contains the patent documents used to train the ML models.

A working paper describing the dataset is available and can be cited as Giczy, A., Pairolero, N., and Toole, A. 2021. Identifying artificial intelligence (AI) invention: A novel AI patent dataset. USPTO Economic Working Paper Series No. 2021-2. Available at SSRN: https://ssrn.com/abstract=3866793.

This effort was made possible through cross business unit collaboration among OCE, the Office of Policy and International Affairs, the Patents Business Unit, and the Office of the Chief Information Officer. The AIPD was used in the USPTO report “Inventing AI: Tracing the diffusion of artificial intelligence with U.S. patents.”

For questions, please email EconomicsData@uspto.gov.

Data files

Download full set of 2020 data files [.dta format (512 MB)] [.tsv format (1.03 GB)]

Download individual data files:

File Name2020*
ai_model_predictionsDTA
496 MB
TSV
1.02 GB
ai_model_training_doc_seedgroupsDTA
16.2 MB
TSV
14.3 MB

* Note: the 2020 .dta files are saved in the Stata-14 format.