| US 7,603,278 B2 | ||
| Segment set creating method and apparatus | ||
| Toshiaki Fukada, Yokohama (Japan); Masayuki Yamada, Kawasaki (Japan); and Yasuhiro Komori, Kawasaki (Japan) | ||
| Assigned to Canon Kabushiki Kaisha, Tokyo (Japan) | ||
| Filed on Sep. 14, 2005, as Appl. No. 11/225,178. | ||
| Claims priority of application No. 2004-268714 (JP), filed on Sep. 15, 2004. | ||
| Prior Publication US 2006/0069566 A1, Mar. 30, 2006 | ||
| Int. Cl. G10L 13/08 (2006.01); G10L 13/06 (2006.01) | ||
| U.S. Cl. 704—260 [704/267; 704/245; 704/200; 704/243; 704/254; 704/236; 704/258; 345/473; 434/156] | 7 Claims |

| 1. A computer implemented segment set creating method for creating on a computer a speech segment set used for multilingual
speech synthesis, the computer implemented method comprising the steps of:
(a) obtaining a first segment set, the set including a phoneme environment, address data of each segment of respective languages,
and segment data of each segment, which are corresponding with each other;
(b) converting a plurality of sets of phoneme labels defined in each language into a common set of phoneme labels shared by
the multiple languages;
(c) converting a plurality of sets of prosody labels defined in each language into a common set of prosody labels shared by
the multiple languages;
(d) creating triphone models from a speech database for training;
(e) creating a decision tree using the triphone models and a set of questions relating to the phonological environment, the
phonological environment including a phoneme environment represented by the common set of phoneme labels and prosody environment
represented by the common set of prosody labels;
(f) performing clustering of the first segment set using the decision tree;
(g) for each cluster obtained in step (f), selecting a template segment having the maximum time length of the largest number
of pitch periods of the segments belonging to a cluster;
(h) deforming the segments belonging to the cluster to have the number of pitch periods and the pitch period length of the
template segment;
(i) generating a representative segment of a segment set belonging to the cluster by calculating an average of the deformed
segments;
(j) for each cluster, replacing segments belonging to the cluster with the representative segment and deleting segment data
of the replaced segments; and
(k) creating a second segment set as an updated set of the first segment set by replacing the address data of each replaced
segment with address data of a corresponding representative segment.
|