US 7,454,340 B2
Voice recognition performance estimation apparatus, method and program allowing insertion of an unnecessary word
Masaru Sakai, Kawasaki (Japan); and Hiroshi Kanazawa, Ebina (Japan)
Assigned to Kabushiki Kaisha Toshiba, Tokyo (Japan)
Filed on Sep. 02, 2004, as Appl. No. 10/931,998.
Claims priority of application No. 2003-312747 (JP), filed on Sep. 04, 2003.
Prior Publication US 2005/0086055 A1, Apr. 21, 2005
Int. Cl. G10L 15/00 (2006.01); G10L 13/00 (2006.01)
U.S. Cl. 704—251  [704/231; 704/258; 704/260] 6 Claims
OG exemplary drawing
 
1. A voice recognition estimating apparatus for a voice recognition apparatus, comprising:
a voice data property generator that generates properties of voice data used to determine, based on an estimation item, a feature of synthetic voice data, the estimation item being used to estimate a performance of the voice recognition apparatus;
a parameter generator that generates a parameter used to generate the synthetic voice data corresponding to the properties of the voice data;
a synthetic voice generator that generates the synthetic voice data based on the parameter;
an output unit configured to output the synthetic voice data to the voice recognition apparatus;
an acquisition unit configured to acquire a recognition result from the voice recognition apparatus, the recognition result being obtained when the voice recognition apparatus recognizes the synthetic voice data; and
an estimation unit configured to estimate the performance of the voice recognition apparatus with reference to the estimation item and the recognition result,
wherein the voice data property generator includes
another acquisition unit configured to acquire vocabulary data and unnecessary word data as the estimation item, the vocabulary data being used to make the synthetic voice data correspond to an actual voice indicating a word, the unnecessary word data indicating an unnecessary word inserted in the vocabulary data and an insertion position of the unnecessary word;
a voice quality storage that stores a plurality of voice quality data items;
a selector that selects several voice quality data items of the voice quality data items from the voice quality storage in accordance with the estimation item; and
a generator that generates the properties of the voice data, the properties of the voice data including the selected voice quality data items, the vocabulary data and the unnecessary word data.