US 11,682,379 B2
Learnable speed control of speech synthesis
Chengzhu Yu, Bellevue, WA (US); and Dong Yu, Bothell, WA (US)
Assigned to TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed by TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed on Feb. 24, 2022, as Appl. No. 17/679,790.
Application 17/679,790 is a continuation of application No. 16/807,801, filed on Mar. 3, 2020, granted, now 11,302,301.
Prior Publication US 2022/0180856 A1, Jun. 9, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 13/033 (2013.01); G10L 13/047 (2013.01); G10L 13/02 (2013.01); G10L 13/04 (2013.01); G10L 13/07 (2013.01); G10L 25/18 (2013.01); G10L 13/06 (2013.01); G10L 25/24 (2013.01)
CPC G10L 13/033 (2013.01) [G10L 13/047 (2013.01); G10L 13/06 (2013.01); G10L 25/18 (2013.01); G10L 25/24 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method of synthesizing speech at one or more speeds, comprising:
receiving, by a computer, a sequence of one or more phonemes, and outputting a sequence of one or more hidden states containing a sequential representation associated with the received sequence of phonemes;
aligning, by the computer, the one or more phonemes to one or more target acoustic frames based on an encoded context, based on generating one or more frame-aligned hidden states according to a rate associated with each phoneme;
recursively generating, by the computer, one or more mel-spectrogram features from the aligned phonemes and the target acoustic frames; and
synthesizing, by the computer, a voice sample at a given speed corresponding to a speaking voice using the generated mel-spectrogram features.