CPC G06F 40/18 (2020.01) [G06F 40/103 (2020.01); G06N 3/08 (2013.01)] | 13 Claims |
1. A method for automated data classification error correction through machine learning, comprising:
receiving a set of predicted labels corresponding to a set of consecutive text strings that appear in a particular order in a document, wherein the set of consecutive text strings comprises:
a first text string corresponding to a first predicted label of the set of predicted labels;
a second text string that follows the first text string in the particular order and corresponds to a second predicted label of the set of predicted labels; and
a third text string that follows the second text string in the particular order and corresponds to a third predicted label of the set of predicted labels;
providing one or more inputs to a machine learning model based on:
the third text string;
the second text string;
the second predicted label;
and the first predicted label;
wherein the machine learning model has been trained through a supervised learning process based on training data; and
wherein the machine learning model comprises:
one or more layers that generate embeddings based on the third text string and the second text string;
encoding logic that generates encodings of the second predicted label and the first predicted label;
concatenation logic that concatenates the embeddings with the encodings to produce a concatenated result, and
a softmax layer that generates one or more probabilities based on the concatenated result;
determining a corrected third label for the third text string based on an output provided by the machine learning model in response to the one or more inputs;
replacing the third predicted label with the corrected third label for the third text string;
receiving user input related to the corrected third label; and
generating updated training data for re-training the machine learning model based on the user input and the third text string, wherein the machine learning model is re-trained through a process in which parameters of the machine learning model are iteratively adjusted based on the updated training data.
|