US 7,542,958 B1
Methods for determining the similarity of content and structuring unstructured content from heterogeneous sources
David S. Warren, Stony Brook, N.Y. (US); Terrance L. Swift, Great Falls, Va. (US); Tatyana Vidrevich, Port Jefferson, N.J. (US); Iv Ramakrishnan, Setauket, N.Y. (US); L. Robert Pokorny, Calverton, N.Y. (US); Alex Beggs, Setauket, N.Y. (US); Christopher Rued, East Setauket, N.Y. (US); Michael Epstein, East Northport, N.Y. (US); Harpreet Singh, Elmont, N.Y. (US); and Hasan Davulcu, Tempe, Ariz. (US)
Assigned to XSB, Inc., Stony Brook, N.Y. (US)
Filed on Sep. 11, 2003, as Appl. No. 10/660,305.
Claims priority of provisional application 60/410684, filed on Sep. 13, 2002.
Int. Cl. G06N 5/02 (2006.01); G06F 17/21 (2006.01)
U.S. Cl. 706—48  [706/61] 36 Claims
OG exemplary drawing
 
1. A collection of software tools embodied on a tangible computer readable medium coupled to a processor for acquiring unstructured data from diverse sources and structuring the data and/or determining similarity of content for the purpose of product information management, said collection comprising:
two or more tools selected from the group consisting of a web agent creator having means for creating a web agent to seek out and acquire product information on the world wide web, a web agent created by the web agent creator, the web agent having means for acquiring product information from the world wide web, a web agent manager having means for managing said web agent, an ontology-directed classifier having means for classifying product information, an ontology-directed extractor having means for extracting product information from content contained in unstructured textual product descriptions, and an ontology-directed matcher having means for matching product information extracted by the extractor through matching product categories and attributes, the tools providing a tangible result selected from the group consisting of a web agent having means for acquiring product information from the world wide web, classified product information, product information extracted from content contained in unstructured textual product descriptions, and information matched with product categories and attributes.