| US 7,587,307 B2 | ||
| Method and apparatus for evaluating machine translation quality | ||
| Nicola Cancedda, Grenoble (France); and Kenji Yamada, Grenoble (France) | ||
| Assigned to Xerox Corporation, Norwalk, Conn. (US) | ||
| Filed on Dec. 18, 2003, as Appl. No. 10/737,972. | ||
| Prior Publication US 2005/0137854 A1, Jun. 23, 2005 | ||
| Int. Cl. G06F 17/27 (2006.01); G06F 17/28 (2006.01) | ||
| U.S. Cl. 704—2 [704/9] | 22 Claims |

| 1. A method for computing machine translation performance, comprising:
receiving a sequence of natural language data in a first language;
translating the sequence of natural language data to a second language to define a machine translation of the sequence of
natural language data comprising symbols;
receiving a reference translation of the sequence of natural language data in the second language comprising symbols;
with a computer processor, computing a sequence kernel that provides a similarity measure between the machine translation
and the reference translation based on occurrences of subsequences that are shared by the machine translation and the reference
translation, for a selected subsequence length, including performing an inner product in a feature space of all possible subsequences
of a selected subsequence length;
outputting a signal indicating the similarity measure;
wherein the similarity measure accounts for non-contiguous occurrences of subsequences of the selected subsequence length
that are shared between the machine translation and the reference translation, in which the non-contiguous subsequences share
symbols and comprise a gap of at least one symbol which has been determined not to match a symbol in the gap of the other.
|