US 11,809,820 B2
	Language characteristic extraction device, named entity extraction device, extraction method, and program
Kuniko Saito, Tokyo (JP); Nozomi Kobayashi, Tokyo (JP); and Junji Tomita, Tokyo (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 17/049,939
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Apr. 22, 2019, PCT No. PCT/JP2019/017049 § 371(c)(1), (2) Date Oct. 22, 2020, PCT Pub. No. WO2019/208507, PCT Pub. Date Oct. 31, 2019.
Claims priority of application No. 2018-083500 (JP), filed on Apr. 24, 2018.
Prior Publication US 2021/0097237 A1, Apr. 1, 2021
Int. Cl. G06F 40/00 (2020.01); G06F 40/20 (2020.01); G06F 40/58 (2020.01); G06F 40/51 (2020.01); G06F 40/263 (2020.01); G06F 40/295 (2020.01); G06F 40/30 (2020.01); G06F 40/268 (2020.01)

CPC G06F 40/20 (2020.01) [G06F 40/263 (2020.01); G06F 40/295 (2020.01); G06F 40/30 (2020.01); G06F 40/51 (2020.01); G06F 40/58 (2020.01); G06F 40/268 (2020.01)]

20 Claims

1. A computer-implemented method for processing a text, the method comprising:

receiving an input text, wherein the input text relates to a target language;

generating a morphological analysis result of the input text;

selecting, based on characteristics of the target language of the input text, a first rule from a set of abstract rules common across a plurality of languages, wherein the plurality of languages include the target language;

determining, based on the selected first rule, a second rule for the target language, wherein the second rule relates to a rule for a language-specific characteristic extraction, and wherein the rule for a language-specific characteristic extraction includes a method for a feature extraction and a condition for output specific to the target language;

determining, based on the second rule for the target language, a feature of the input text, wherein the second rule includes extracting a characteristic of a representation or a part of speech in the morphological analysis result; and

providing the feature of the input text as a result of extracting language characteristics of the input text.