CPC G06F 40/186 (2020.01) [G06F 16/248 (2019.01); G06F 16/2457 (2019.01); G06F 16/93 (2019.01); G06F 40/106 (2020.01); G06F 40/117 (2020.01); G06F 40/169 (2020.01); G06F 40/289 (2020.01); G06F 40/295 (2020.01); G06F 40/30 (2020.01); G06N 20/00 (2019.01); G06V 30/414 (2022.01); G06V 30/416 (2022.01)] | 20 Claims |
1. A method implemented on a computer system executing instructions for analyzing and annotating documents, the method comprising:
importing documents in a document set;
performing visual extraction of the imported documents, including creating signatures for document parts;
automatically identifying a hierarchical structure of chunks within individual documents in the document set (a) based on the visual extraction, content, and contexts in the individual document; and (b) based on patterns of visual extraction and content across the documents in the document set, wherein the hierarchical structure includes small chunks comprising series of words from within individual sentences;
for at least some of the small chunks, in a process separate from identifying the small chunks, automatically selecting text from sentences surrounding the small chunks as labels for semantic roles played by the small chunks in a transaction described by the individual documents;
standardizing the labels for semantic roles across the documents in the document set; and
annotating documents in the document set wherein the annotations include locations of the identified small chunks and standardized labels for the semantic roles played by the identified small chunks at those locations.
|