Recent research shows that patent citations suffer from significant noise, measurement error, and weakening informational content (Gambardella, Harhoff and Verspagen 2008; Roach and Cohen 2013; and Kuhn, Younge and Marco 2017), casting doubt on the accuracy of citation-based measurements of technology flows, patent value, complexity and the various economic mechanisms in patent thickets (Egan and Teece 2015). The recently released USPTO Office Action dataset (Lu, Myers and Beliveau 2017) allows for further assessment of the informational content of patent citations.
The figure above displays the textual similarity between the claims from a focal patent and the claims contained in the focal patent’s citations (what we call invention similarity). The dashed blue line represents patent claim similarity using only those citations contained in a USPTO examiner’s Office Action while the red solid line represents patent claim similarity to all other citations to the focal patent. From the figure, the solid red curve lies above the dashed blue curve at low levels of claim similarity. This tells us that many of the citations not used in Office Action rejections are not technologically similar based on claim language. While citations contained in examiner Office Actions generally have higher technological similarity, some of the non-Office Action citations still contain valuable information. For instance, those that have a similarity index of 0.2 and higher.
OCE Economists and Data Scientist used the textual similarity of patent claims in citations to improve existing citation-based measures of technological complexity. Please see the working paper for further details along with the full references cited here.