US 7,548,929 B2
System and method for determining semantically related terms
Robert J. Collins, Carlsbad, Calif. (US); Graham Harris, Altadena, Calif. (US); Jesse Harris, Van Nuys, Calif. (US); Grant Kushida, Los Angeles, Calif. (US); Lance Riedel, Altadena, Calif. (US); Mohammad Sabah, North Hollywood, Calif. (US); Shaji Sebastian, Pasadena, Calif. (US); Jeff Yuan, Pasadena, Calif. (US); and Yiping Zhou, Sunnyvale, Calif. (US)
Assigned to Yahoo! Inc., Sunnyvale, Calif. (US)
Filed on May 11, 2006, as Appl. No. 11/432,266.
Claims priority of provisional application 60/703904, filed on Jul. 29, 2005.
Prior Publication US 2007/0027864 A1, Feb. 01, 2007
Int. Cl. G06F 17/30 (2006.01)
U.S. Cl. 707—101 23 Claims
OG exemplary drawing
 
1. A method for determining semantically related terms, comprising:
receiving one or more seed terms;
searching a first index to determine a plurality of webpages associated with the seed terms, the first index comprising a plurality of terms and for each term of the plurality of terms, an association between one or more webpages and the term;
searching a second index to determine a plurality of potential terms associated with the plurality of webpages associated with the seed terms, the second index comprising a plurality of identifiers for webpages and for each webpages of the plurality of identifiers for webpages, an association between one or more terms and the webpage;
sending at least one term of the plurality of potential terms to a user to suggest the at least one term of the plurality of potential terms to the user;
receiving an indication of relevance of at least one suggested term to the user;
modifying with a processor the terms which comprise the seed terms based at least in part on the received indication of relevance;
receiving an indication that a first term is relevant to the user; and
modifying with a processor the seed terms to comprise the first term as a positive seed term;
wherein receiving one or more seed terms comprises;
receiving a location of a webpage
retrieving with a processor the content of the webpage from the location of the webpage;
stripping code from the content of the webpage with a processor;
pulling one or more terms from the content of the webpage; and
weighing each term of the one or more terms pulled form the content of the webpage with a processor based on a location of where the term was located on the webpage.