US 7,584,188 B2
System and method for searching and matching data having ideogrammatic content
Anthony Scriffignano, West Caldwell, N.J. (US); Kevin Nedd, Long Valley, N.J. (US); Peihsin Shao, Taipei (Taiwan); Simpeng Gan, Shangai (Singapore); Sarah Lu, Shangai (China); Masayuki Okada, Tokyo (Japan); Mayako Kasai, Tokyo (Japan); Julian N. N. Prower, Oxone (United Kingdom); Nicholas Teoh, Kuala Lumpur (Malaysia); Jeremy Sy, Glen Waverley (Australia); and Warwick Matthews, Victoria (Australia)
Assigned to Dun and Bradstreet, Short Hills, N.J. (US)
Filed on Nov. 22, 2006, as Appl. No. 11/603,413.
Claims priority of provisional application 60/739270, filed on Nov. 23, 2005.
Prior Publication US 2007/0162445 A1, Jul. 12, 2007
Int. Cl. G06F 7/00 (2006.01)
U.S. Cl. 707—6  [707/3; 707/4; 707/10; 704/8] 16 Claims
OG exemplary drawing
 
1. A computerized method of searching and matching input data to stored data storing in the memory, the method comprising:
receiving the input data comprising a search string having a plurality of elements, at least some of the elements forming part of an ideogrammatic writing system;
converting a subset of the plurality of elements to a set of terms using at least one method selected from the group consisting of polylogogrammatic semantic disambiguation, hanzee acronym expansion, kanji acronym expansion, and business word recognition;
wherein the converting step comprises normalizing traditional and simple versions of the ideogrammatic writing system;
generating a plurality of keys from the set of terms;
determining from the stored data (a) optimization of said plurality of keys, thus yielding optimized keys, and (b) candidates that share a commonality with said optimized keys, thus yielding key intersections and a quantity for said key intersections; generating a cost function for said key intersections;
prioritizing said key intersections according to said cost function, thus yielding cost-prioritized key intersections;
retrieving match candidates in order of said cost-prioritized key intersections, and bounded by a pre-determined threshold and said quantity;
wherein the retrieving step further comprises generating a matchgrade, a confidence code, and a match data profile for each match candidate based on a degree of match; and
selecting a best match from the match candidates.