| US 7,590,540 B2 | ||
| Method and system for statistic-based distance definition in text-to-speech conversion | ||
| Wei Z W Zhang, Beijing (China); Xi Jun Ma, Beijing (China); Ling Jin, Beijing (China); and Hai Xin Chai, Beijing (China) | ||
| Assigned to Nuance Communications, Inc., Burlington, Mass. (US) | ||
| Filed on Sep. 29, 2005, as Appl. No. 11/239,500. | ||
| Claims priority of application No. 2004 1 0085186 (CN), filed on Sep. 30, 2004. | ||
| Prior Publication US 2006/0074674 A1, Apr. 06, 2006 | ||
| Int. Cl. G10L 13/08 (2006.01) | ||
| U.S. Cl. 704—260 [704/258; 704/266] | 18 Claims |

| 1. A method comprising the steps of:
analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation;
performing clustering for samples in the obtained text through the use of a decision tree, wherein clustering comprises combining
two branches of the decision tree for clustering samples if the two branches are similar for further clustering;
generating a Gaussian Mixture Model for each cluster to determine the distance between the sample and the corresponding Gaussian
Mixture Model;
using electronic logic circuitry to identify a sample according to the distance; and
transforming the identified sample into synthesized speech.
|