US 11,704,505 B2
Language processing method and device
Chao Xing, Beijing (CN); Xiao Chen, Hong Kong (CN); and Zhenlin Cai, Shenzhen (CN)
Assigned to HUAWEI TECHNOLOGIES CO., LTD., Shenzhen (CN)
Filed by Huawei Technologies Co., Ltd., Shenzhen (CN)
Filed on Jun. 22, 2020, as Appl. No. 16/907,783.
Application 16/907,783 is a continuation of application No. PCT/CN2018/102498, filed on Aug. 27, 2018.
Claims priority of application No. 201711411206.3 (CN), filed on Dec. 23, 2017.
Prior Publication US 2020/0320255 A1, Oct. 8, 2020
Int. Cl. G06F 40/58 (2020.01); G06F 40/51 (2020.01); G06F 40/263 (2020.01); G06F 40/55 (2020.01); G06F 40/30 (2020.01)
CPC G06F 40/58 (2020.01) [G06F 40/263 (2020.01); G06F 40/30 (2020.01); G06F 40/51 (2020.01); G06F 40/55 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A language processing method implemented by a computer device using a model, the method comprising:
obtaining n pairs of translation sentences of a source language and a target language, wherein each of the n pairs of translation sentences comprises a source language sentence and a target language sentence that is a translation sentence of the source language sentence, wherein the computer device determines a word alignment relationship using a word alignment model, and wherein n is an integer greater than one;
extracting source language segments from n source language sentences in the n pairs of translation sentences using an extraction rule of the source language;
extracting target language segments from n target language sentences in the n pairs of translation sentences, wherein the target language segments are translations of corresponding source language segments;
generating, by the computer device, an extraction rule of the target language based on the target language segments;
detecting, by the computer device, whether the extraction rule is accurate, and when the extraction rule is inaccurate, updating the extraction rule of the target language;
applying the extraction rule of the source language to a source language corpus to obtain M source language segments, wherein M is an integer;
applying the extraction rule of the target language to a target language corpus to obtain b target language segments, wherein a quantity of source language sentences in the source language corpus is the same as a quantity of target language sentences in the target language corpus, wherein the source language sentences in the source language corpus and the target language sentences in the target language corpus are translations of each other, and wherein b is an integer;
storing a correspondence between the extraction rule of the source language and the extraction rule of the target language;
obtaining, based on the correspondence, the n pairs of translation sentences; and
updating the extraction rule of the target language when:
M and b are not equal; and
a semantic mismatch exists in a pair of a target language segment and a corresponding source language segment.