US 11,816,165 B2
Identification of fields in documents with neural networks without templates
Stanislav Semenov, Moscow (RU)
Assigned to ABBYY Development Inc., Dover, DE (US)
Filed by ABBYY Development Inc., Dover, DE (US)
Filed on Nov. 22, 2019, as Appl. No. 16/692,169.
Claims priority of application No. RU2019137304 (RU), filed on Nov. 20, 2019.
Prior Publication US 2021/0150338 A1, May 20, 2021
Int. Cl. G06F 16/93 (2019.01); G06F 9/30 (2018.01); G06N 3/08 (2023.01); G06V 30/224 (2022.01); G06F 16/335 (2019.01); G06F 40/279 (2020.01); G06V 10/762 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 30/412 (2022.01); G06F 18/23 (2023.01); G06F 18/24 (2023.01); G06F 18/232 (2023.01); G06F 18/2413 (2023.01); G06V 30/10 (2022.01)
CPC G06F 16/93 (2019.01) [G06F 9/30036 (2013.01); G06F 16/335 (2019.01); G06F 18/23 (2023.01); G06F 18/232 (2023.01); G06F 18/2413 (2023.01); G06F 18/24765 (2023.01); G06F 40/279 (2020.01); G06N 3/08 (2013.01); G06V 10/763 (2022.01); G06V 10/764 (2022.01); G06V 10/765 (2022.01); G06V 10/82 (2022.01); G06V 30/224 (2022.01); G06V 30/412 (2022.01); G06V 30/10 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
obtaining a layout of a document, the document having a plurality of fields;
identifying the document, based on the layout, as belonging to a first type of documents of a plurality of identified types of documents;
identifying a plurality of symbol sequences of the document;
processing, by a processing device, the plurality of symbol sequences of the document using a first neural network associated with the first type of documents to generate a plurality of feature vectors;
using the plurality of feature vectors to form one or more association hypotheses, wherein each of the one or more association hypotheses associates one of the plurality of fields of the document with at least one of the plurality of feature vectors;
determining, using the one or more association hypotheses, an association of a first field of the plurality of fields with a first set of one or more symbol sequences of the plurality of symbol sequences of the document; and
causing a representation of the first set of the one or more symbol sequences to be stored in a computer memory in association with a profile of the document.