| US 7,502,739 B2 | ||
| Intonation generation method, speech synthesis apparatus using the method and voice server | ||
| Takashi Saito, Tokyo-to (Japan); and Masaharu Sakamoto, Yokohama (Japan) | ||
| Assigned to International Business Machines Corporation, Armonk, N.Y. (US) | ||
| Filed on Jan. 24, 2005, as Appl. No. 10/784,044. | ||
| Prior Publication US 2005/0114137 A1, May 26, 2005 | ||
| Int. Cl. G10L 13/00 (2006.01); G10L 13/06 (2006.01) | ||
| U.S. Cl. 704—260 [704/266; 704/268] | 2 Claims |

| 1. A speech synthesis apparatus for performing a text-to-speech synthesis to generate synthesized speech, comprising:
a text analysis unit for performing linguistic analysis of input text as a processing target and acquiring language information
therefrom and providing speech output to a prosody control unit;
a first database for storing intonation patterns of actual speech;
a prosody control unit for receiving speech output from the text analysis unit and for generating a prosody comprising determining
pitch, length and intensity of a sound for each phoneme comprising said speech and a rhythm of speech with positions of pauses
for audibly outputting the text and providing the prosody to a speech generation unit; and
a speech generation unit for receiving the prosody from the prosody control unit and for generating synthesized speech based
on the prosody generated by the prosody control unit,
wherein the prosody control unit includes:
an outline estimation section for estimating an outline of an intonation for each assumed accent phrase configuring the text
based on language information acquired by the text analysis unit, wherein the outline estimation section defines the outline
of the intonation at least by a maximum value of a frequency level in a segment of the assumed accent phrase and relative
level offsets in a starting end and termination end of the segment;
a shape element selection section for selecting an intonation pattern from the database based on the outline of the intonation,
the outline having been estimated by the outline estimation section and wherein the shape element selection section selects
an intonation pattern approximate in shape to the outline of the information, the outline having been estimated by the outline
intonation section, among the intonation patterns of the actual speech, the intonation patterns having been accumulated in
the database; and
a shape element connection section for connecting the intonation pattern for each assumed accent phrase to the intonation
pattern for another assumed accent phrase, each intonation pattern having been selected by the shape element selection section,
to generate an intonation pattern of an entire body of the text, wherein the shape element connection section connects the
intonation pattern for each assumed accent phrase to the other, the intonation pattern having been selected by the shape element
selection section, after adjusting a frequency level of the assumed accent phrase based on the outline of the intonation,
the outline having been estimated by the outline estimation section.
|