| US 7,584,188 B2 | ||
| System and method for searching and matching data having ideogrammatic content | ||
| Anthony Scriffignano, West Caldwell, N.J. (US); Kevin Nedd, Long Valley, N.J. (US); Peihsin Shao, Taipei (Taiwan); Simpeng Gan, Shangai (Singapore); Sarah Lu, Shangai (China); Masayuki Okada, Tokyo (Japan); Mayako Kasai, Tokyo (Japan); Julian N. N. Prower, Oxone (United Kingdom); Nicholas Teoh, Kuala Lumpur (Malaysia); Jeremy Sy, Glen Waverley (Australia); and Warwick Matthews, Victoria (Australia) | ||
| Assigned to Dun and Bradstreet, Short Hills, N.J. (US) | ||
| Filed on Nov. 22, 2006, as Appl. No. 11/603,413. | ||
| Claims priority of provisional application 60/739270, filed on Nov. 23, 2005. | ||
| Prior Publication US 2007/0162445 A1, Jul. 12, 2007 | ||
| Int. Cl. G06F 7/00 (2006.01) | ||
| U.S. Cl. 707—6 [707/3; 707/4; 707/10; 704/8] | 16 Claims |

| 1. A computerized method of searching and matching input data to stored data storing in the memory, the method comprising:
receiving the input data comprising a search string having a plurality of elements, at least some of the elements forming
part of an ideogrammatic writing system;
converting a subset of the plurality of elements to a set of terms using at least one method selected from the group consisting
of polylogogrammatic semantic disambiguation, hanzee acronym expansion, kanji acronym expansion, and business word recognition;
wherein the converting step comprises normalizing traditional and simple versions of the ideogrammatic writing system;
generating a plurality of keys from the set of terms;
determining from the stored data (a) optimization of said plurality of keys, thus yielding optimized keys, and (b) candidates
that share a commonality with said optimized keys, thus yielding key intersections and a quantity for said key intersections;
generating a cost function for said key intersections;
prioritizing said key intersections according to said cost function, thus yielding cost-prioritized key intersections;
retrieving match candidates in order of said cost-prioritized key intersections, and bounded by a pre-determined threshold
and said quantity;
wherein the retrieving step further comprises generating a matchgrade, a confidence code, and a match data profile for each
match candidate based on a degree of match; and
selecting a best match from the match candidates.
|