US 7,590,628 B2
Determining document subject by using title and anchor text of related documents
Shubin Zhao, Jersey City, N.J. (US)
Assigned to Google, Inc., Mountain View, Calif. (US)
Filed on Mar. 31, 2006, as Appl. No. 11/394,610.
Prior Publication US 2007/0240031 A1, Oct. 11, 2007
Int. Cl. G06F 17/00 (2006.01)
U.S. Cl. 707—6  [715/205] 14 Claims
OG exemplary drawing
 
1. A computer-implemented method of determining the subject of a target document, comprising: using a computer processor to perform:
(a) identifying a plurality of peer documents within the same domain as the target document, each of the plurality of peer documents being associated with the target document and being stored in a computer memory;
(b) for each of the plurality of peer documents,
(i) identifying a plurality of linking documents having links that link to the peer document, each link having an anchor text,
(ii) identifying a plurality of linking documents having links that link to the peer document, each link having an anchor text, and
(iii) identifying a first pattern common to the title of the peer document and the selected anchor text;
(c) identifying a second pattern from the first patterns associated with the plurality of peer documents by identifying the second pattern from the first patterns associated with the plurality of peer documents based at least in part on the number of peer documents associated with the first patterns; and
(d) identifying a subject for the target document based on the second pattern and a title of the target document.