US 11,816,427 B1
Automated data classification error correction through spatial analysis using machine learning
Mithun Ghosh, Bangalore (IN); and Vignesh Thirukazhukundram Subrahmaniam, Bangalore (IN)
Assigned to INTUIT, INC., Mountain View, CA (US)
Filed by INTUIT INC., Mountain View, CA (US)
Filed on Oct. 27, 2022, as Appl. No. 18/050,092.
Int. Cl. G06N 3/08 (2023.01); G06F 40/18 (2020.01); G06F 40/103 (2020.01)
CPC G06F 40/18 (2020.01) [G06F 40/103 (2020.01); G06N 3/08 (2013.01)] 13 Claims
OG exemplary drawing
 
1. A method for automated data classification error correction through machine learning, comprising:
receiving a set of predicted labels corresponding to a set of consecutive text strings that appear in a particular order in a document, wherein the set of consecutive text strings comprises:
a first text string corresponding to a first predicted label of the set of predicted labels;
a second text string that follows the first text string in the particular order and corresponds to a second predicted label of the set of predicted labels; and
a third text string that follows the second text string in the particular order and corresponds to a third predicted label of the set of predicted labels;
providing one or more inputs to a machine learning model based on:
the third text string;
the second text string;
the second predicted label;
and the first predicted label;
wherein the machine learning model has been trained through a supervised learning process based on training data; and
wherein the machine learning model comprises:
one or more layers that generate embeddings based on the third text string and the second text string;
encoding logic that generates encodings of the second predicted label and the first predicted label;
concatenation logic that concatenates the embeddings with the encodings to produce a concatenated result, and
a softmax layer that generates one or more probabilities based on the concatenated result;
determining a corrected third label for the third text string based on an output provided by the machine learning model in response to the one or more inputs;
replacing the third predicted label with the corrected third label for the third text string;
receiving user input related to the corrected third label; and
generating updated training data for re-training the machine learning model based on the user input and the third text string, wherein the machine learning model is re-trained through a process in which parameters of the machine learning model are iteratively adjusted based on the updated training data.