US 11,816,184 B2
Ordering presentation of training documents for machine learning
Tohru Hasegawa, Tokyo (JP); Kaoru Ohashi, Tokyo (JP); Steven Michael Pritko, Pittsburgh, PA (US); and Aaron Santavicca, Pittsburgh, PA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Mar. 19, 2021, as Appl. No. 17/206,980.
Prior Publication US 2022/0300762 A1, Sep. 22, 2022
Int. Cl. G06F 18/214 (2023.01); G06N 20/00 (2019.01); G06V 30/416 (2022.01); G06F 18/22 (2023.01); G06F 18/2113 (2023.01)
CPC G06F 18/2148 (2023.01) [G06F 18/2113 (2023.01); G06F 18/22 (2023.01); G06N 20/00 (2019.01); G06V 30/416 (2022.01); G06T 2207/30176 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for efficient annotation of a plurality of documents, the method comprising:
selecting, by one or more processors, a plurality of documents;
calculating, by the one or more processors, a similarity between pairs of pages of a respective document of the plurality of documents;
determining, by the one or more processors, a document similarity value of the respective document of the plurality of documents, based on a quantity of the pairs of pages within the respective document with the similarity calculated to be less than a predetermined threshold; and
presenting, by the one or more processors, the plurality of documents in a descending order, based on the document similarity value of respective documents of the plurality of documents.