US 7,590,540 B2
Method and system for statistic-based distance definition in text-to-speech conversion
Wei Z W Zhang, Beijing (China); Xi Jun Ma, Beijing (China); Ling Jin, Beijing (China); and Hai Xin Chai, Beijing (China)
Assigned to Nuance Communications, Inc., Burlington, Mass. (US)
Filed on Sep. 29, 2005, as Appl. No. 11/239,500.
Claims priority of application No. 2004 1 0085186 (CN), filed on Sep. 30, 2004.
Prior Publication US 2006/0074674 A1, Apr. 06, 2006
Int. Cl. G10L 13/08 (2006.01)
U.S. Cl. 704—260  [704/258; 704/266] 18 Claims
OG exemplary drawing
 
1. A method comprising the steps of:
analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation;
performing clustering for samples in the obtained text through the use of a decision tree, wherein clustering comprises combining two branches of the decision tree for clustering samples if the two branches are similar for further clustering;
generating a Gaussian Mixture Model for each cluster to determine the distance between the sample and the corresponding Gaussian Mixture Model;
using electronic logic circuitry to identify a sample according to the distance; and
transforming the identified sample into synthesized speech.