CPC G06N 3/045 (2023.01) [G06N 3/044 (2023.01); G06N 3/084 (2013.01)] | 14 Claims |
1. A computer-implemented method comprising:
identifying, using the at least one processor, a set of one or more entity mentions in an electronic document, an entity mention to be linked to a page of a plurality of candidate pages in a knowledge base;
representing each entity mention as a plurality of word sequences capturing a context or topic of the entity mention at multiple granularities in the electronic document;
for each entity mention in the electronic document, identify a set of target candidate pages in the knowledge base that potentially refer to the entity mention in the document;
applying a scoring function to obtain a relevance score for each said target candidate page of the corpus for each mention, said applying a scoring function comprising:
running a CNN model using the plurality of word sequences of the entity mention and a candidate target page of the knowledge base to compute a first score representing a local similarity score between each entity mention and candidate target page, said running a CNN model further comprising forward linking of each entity mention to identified target candidate pages, and ranking forward links based on the first score; and
running a RNN model that simultaneously models an interdependence among the other entity mentions in the document and other candidate pages to compute a second score, said running a RNN model comprising a backward linking of the entity mentions to identified target candidate pages by traversing, using RNN Model operations, the entity mentions from an end to the beginning of the electronic document, wherein second scores are computed for all the target candidates pages of all the entity mentions in each document simultaneously, while preserving the order of the entity mentions from the beginning to the end of an input document;
creating a combined linking score by adding the first computed score of a forward linked target candidate page for an entity mention and the second computed score of the backward linked target candidate page for that entity mention;
ranking said target candidate pages based on their combined linking score for the entity mention; and
providing a link for linking the entity mention to the target candidates page of the knowledge base based on a highest combined linking score for the entity mention.
|