| US 7,610,189 B2 | ||
| Method and apparatus for efficient segmentation of compound words using probabilistic breakpoint traversal | ||
| Andrew William Mackie, Los Gatos, Calif. (US) | ||
| Assigned to Nuance Communications, Inc., Burlington, Mass. (US) | ||
| Filed on Oct. 18, 2001, as Appl. No. 10/42,528. | ||
| Prior Publication US 2003/0097252 A1, May 22, 2003 | ||
| Int. Cl. G06F 17/28 (2006.01) | ||
| U.S. Cl. 704—9 | 10 Claims |

| 10. A method for segmenting compound words in an unrestricted natural-language input, the method comprising:
receiving a natural-language input consisting of a plurality of characters;
constructing a set of breakpoints in the natural-language input;
combining weights of tetragraph contexts that precede and follow each breakpoint to assign a weight to the breakpoint in the
natural-language input;
traversing substrings of the natural-language input in an order determined by the weights assigned to the breakpoints;
identifying a plurality of linkable components by the traversal of substrings wherein a linkable component is identified by
locating the component in a lexicon; and
returning a segmented string consisting of a plurality of linkable components spanning the natural-language input, wherein
the segmented string is interpreted as a compound word.
|