US 7,533,092 B2
Link-based spam detection
Pavel Berkhin, Sunnyvale, Calif. (US); Zoltan Istvan Gyongyi, Stanford, Calif. (US); and Jan Pedersen, Los Altos Hills, Calif. (US)
Assigned to Yahoo! Inc., Sunnyvale, Calif. (US)
Filed on Aug. 04, 2005, as Appl. No. 11/198,471.
Claims priority of provisional application 60/623295, filed on Oct. 28, 2004.
Prior Publication US 2006/0095416 A1, May 04, 2006
Int. Cl. G06F 7/00 (2006.01); G06F 17/30 (2006.01)
U.S. Cl. 707—5  [707/7; 707/102] 10 Claims
OG exemplary drawing
 
1. A computer implemented method of ranking search hits in a search result set, the method comprising:
receiving a query from a user;
generating a list of hits related to the query,
wherein each of the hits has a relevance to the query,
wherein at least one hit is pointed to by a link in a boosting document, and
wherein the link in the boosting document artificially elevates the relevance of the at least one hit to the query;
determining a first measure for said at least one hit, wherein the first measure is a link- based popularity measure for said at least one hit;
determining a second measure for said at least one hit, wherein the second measure is a trustworthiness measure for said at least one hit indicative of the likelihood that said at least one hit is a reputable document;
generating a metric for said at least one hit, based at least in part on a discrepancy between the first measure and the second measure;
wherein the metric is representative of the number of boosting documents that contain links, to said at least one hit, which artificially elevate the relevance of said at least one hit to the query;
comparing a threshold value to a value that is based, at least in part, on the metric;
processing the list of hits to form a modified list based in part on the comparing, wherein said at least one hit is either excluded from said modified list, or is presented in said modified list with a lower relevance than was attributed to said at least one hit in said list of hits; and
transmitting the modified list to the user as a response to said query.