US 11,816,444 B2
Pseudo parallel translation data generation apparatus, machine translation processing apparatus, and pseudo parallel translation data generation method
Kenji Imamura, Koganei (JP); Atsushi Fujita, Koganei (JP); and Eiichiro Sumita, Koganei (JP)
Assigned to NATIONAL INSTITUTE OF INFORMATION AND COMMUNICATIONS TECHNOLOGY, Tokyo (JP)
Appl. No. 16/969,619
Filed by NATIONAL INSTITUTE OF INFORMATION AND COMMUNICATIONS TECHNOLOGY, Koganei (JP)
PCT Filed Feb. 12, 2019, PCT No. PCT/JP2019/004805
§ 371(c)(1), (2) Date Aug. 13, 2020,
PCT Pub. No. WO2019/167600, PCT Pub. Date Sep. 6, 2019.
Claims priority of application No. 2018-037055 (JP), filed on Mar. 2, 2018.
Prior Publication US 2021/0027026 A1, Jan. 28, 2021
Int. Cl. G06F 40/58 (2020.01); G06N 3/08 (2023.01); G06N 3/084 (2023.01); G06F 18/211 (2023.01)
CPC G06F 40/58 (2020.01) [G06F 18/211 (2023.01); G06N 3/084 (2013.01)] 7 Claims
OG exemplary drawing
 
1. A pseudo parallel translation data generation apparatus comprising:
a back-translation unit that performs a machine back-translation process on one piece of target language data obtained from a target language monolingual corpus to obtain N pieces of pseudo source language data, N being a natural number equal to or greater than two, the back-translation unit including:
an encoder that obtains input-side hidden state data from input data; and
a decoder that obtains output-side hidden state data from the input-side hidden state data obtained by the encoder, randomly selects data from an output word distribution represented by the obtained output-side hidden state data, and outputs word data corresponding to the selected data as output data; and
a pseudo parallel translation data obtaining unit that pairs the one piece of target language data and each of the N pieces of pseudo source language data obtained by the back-translation unit to obtain N sets of pseudo parallel translation data.