US 9,811,517 B2
Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text
Haibo Liu, Shenzhen (CN); Eryu Wang, Shenzhen (CN); Xiang Zhang, Shenzhen (CN); Li Lu, Shenzhen (CN); Shuai Yue, Shenzhen (CN); Qiuge Liu, Shenzhen (CN); Bo Chen, Shenzhen (CN); Jian Liu, Shenzhen (CN); and Lu Li, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen, Guangdong Province (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed on Jan. 6, 2014, as Appl. No. 14/148,579.
Application 14/148,579 is a continuation of application No. PCT/CN2013/086618, filed on Nov. 6, 2013.
Claims priority of application No. 2013 1 0034265 (CN), filed on Jan. 29, 2013.
Prior Publication US 2014/0214406 A1, Jul. 31, 2014
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 17/27 (2006.01); G06F 17/28 (2006.01); G10L 15/00 (2013.01); G10L 15/26 (2006.01)
CPC G06F 17/273 (2013.01) [G06F 17/2775 (2013.01); G06F 17/2785 (2013.01); G06F 17/289 (2013.01); G10L 15/265 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A computer-implemented method of adding punctuation marks to a Chinese sentence based on a Chinese language punctuation model, wherein the Chinese language punctuation model was pre-generated from a training corpus of Chinese sentences having punctuation marks and includes multiple predefined characteristic units, each predefined characteristic unit including a series of Chinese expressions, possible punctuation marks present in the series of Chinese expressions and their respective probabilities, the method comprising:
at a computer having one or more processors and memory for storing programs to be executed by the one or more processors:
extracting the Chinese sentence from a speech input through speech recognition;
identifying a plurality of expressions in the Chinese sentence by segmenting the Chinese sentence according to their semantic features, each of the plurality of expressions including one or more Chinese characters;
grouping the plurality of expressions in the Chinese sentence into a plurality of characteristic units according to the semantic features of the plurality of expressions using one or more predefined characteristic templates;
extracting, from the Chinese language punctuation model, a plurality of possible punctuation marks appearing in the corresponding series of Chinese expressions and their respective probabilities for each of the plurality of characteristic units;
determining a punctuation mark and its weight for each of the plurality of expressions in the Chinese sentence according to the plurality of possible punctuation marks extracted from the Chinese language punctuation model;
calculating an overall weight for each possible arrangement of punctuation marks in the Chinese sentence based on the weights of punctuation marks at each of the plurality of expressions in the Chinese sentence; and
adding the punctuation marks corresponding to an arrangement of a maximum overall weight into the Chinese sentence.