US 7,587,307 B2
Method and apparatus for evaluating machine translation quality
Nicola Cancedda, Grenoble (France); and Kenji Yamada, Grenoble (France)
Assigned to Xerox Corporation, Norwalk, Conn. (US)
Filed on Dec. 18, 2003, as Appl. No. 10/737,972.
Prior Publication US 2005/0137854 A1, Jun. 23, 2005
Int. Cl. G06F 17/27 (2006.01); G06F 17/28 (2006.01)
U.S. Cl. 704—2  [704/9] 22 Claims
OG exemplary drawing
 
1. A method for computing machine translation performance, comprising:
receiving a sequence of natural language data in a first language;
translating the sequence of natural language data to a second language to define a machine translation of the sequence of natural language data comprising symbols;
receiving a reference translation of the sequence of natural language data in the second language comprising symbols;
with a computer processor, computing a sequence kernel that provides a similarity measure between the machine translation and the reference translation based on occurrences of subsequences that are shared by the machine translation and the reference translation, for a selected subsequence length, including performing an inner product in a feature space of all possible subsequences of a selected subsequence length;
outputting a signal indicating the similarity measure;
wherein the similarity measure accounts for non-contiguous occurrences of subsequences of the selected subsequence length that are shared between the machine translation and the reference translation, in which the non-contiguous subsequences share symbols and comprise a gap of at least one symbol which has been determined not to match a symbol in the gap of the other.