US 7,603,278 B2
Segment set creating method and apparatus
Toshiaki Fukada, Yokohama (Japan); Masayuki Yamada, Kawasaki (Japan); and Yasuhiro Komori, Kawasaki (Japan)
Assigned to Canon Kabushiki Kaisha, Tokyo (Japan)
Filed on Sep. 14, 2005, as Appl. No. 11/225,178.
Claims priority of application No. 2004-268714 (JP), filed on Sep. 15, 2004.
Prior Publication US 2006/0069566 A1, Mar. 30, 2006
Int. Cl. G10L 13/08 (2006.01); G10L 13/06 (2006.01)
U.S. Cl. 704—260  [704/267; 704/245; 704/200; 704/243; 704/254; 704/236; 704/258; 345/473; 434/156] 7 Claims
OG exemplary drawing
 
1. A computer implemented segment set creating method for creating on a computer a speech segment set used for multilingual speech synthesis, the computer implemented method comprising the steps of:
(a) obtaining a first segment set, the set including a phoneme environment, address data of each segment of respective languages, and segment data of each segment, which are corresponding with each other;
(b) converting a plurality of sets of phoneme labels defined in each language into a common set of phoneme labels shared by the multiple languages;
(c) converting a plurality of sets of prosody labels defined in each language into a common set of prosody labels shared by the multiple languages;
(d) creating triphone models from a speech database for training;
(e) creating a decision tree using the triphone models and a set of questions relating to the phonological environment, the phonological environment including a phoneme environment represented by the common set of phoneme labels and prosody environment represented by the common set of prosody labels;
(f) performing clustering of the first segment set using the decision tree;
(g) for each cluster obtained in step (f), selecting a template segment having the maximum time length of the largest number of pitch periods of the segments belonging to a cluster;
(h) deforming the segments belonging to the cluster to have the number of pitch periods and the pitch period length of the template segment;
(i) generating a representative segment of a segment set belonging to the cluster by calculating an average of the deformed segments;
(j) for each cluster, replacing segments belonging to the cluster with the representative segment and deleting segment data of the replaced segments; and
(k) creating a second segment set as an updated set of the first segment set by replacing the address data of each replaced segment with address data of a corresponding representative segment.