CPC G06F 40/58 (2020.01) [G06F 18/211 (2023.01); G06N 3/084 (2013.01)] | 7 Claims |
1. A pseudo parallel translation data generation apparatus comprising:
a back-translation unit that performs a machine back-translation process on one piece of target language data obtained from a target language monolingual corpus to obtain N pieces of pseudo source language data, N being a natural number equal to or greater than two, the back-translation unit including:
an encoder that obtains input-side hidden state data from input data; and
a decoder that obtains output-side hidden state data from the input-side hidden state data obtained by the encoder, randomly selects data from an output word distribution represented by the obtained output-side hidden state data, and outputs word data corresponding to the selected data as output data; and
a pseudo parallel translation data obtaining unit that pairs the one piece of target language data and each of the N pieces of pseudo source language data obtained by the back-translation unit to obtain N sets of pseudo parallel translation data.
|