CPC G06F 40/30 (2020.01) [G06F 16/9024 (2019.01)] | 20 Claims |
1. A computer-implemented method comprising:
ingesting, by one or more computer processors, a first corpus of a plurality of text sentences;
converting, by one or more computer processors, the plurality of text sentences into a plurality of sentence vectors, wherein a sentence vector is a numerical coordinate representation of a sentence in an x-y plane;
grouping, by one or more computer processors, the plurality of sentence vectors into a plurality of sentence clusters, wherein a sentence cluster is composed of sentence vectors that are semantically similar;
receiving, by one or more computer processors, a second corpus of a plurality of text sentences;
determining, by one or more computer processors, a meaning of each sentence of the second corpus;
based on the determined meaning, assigning, by one or more computer processors, each sentence of the second corpus to a sentence cluster of the plurality of sentence clusters;
determining, by one or more computer processors, for each sentence cluster of the plurality of sentence clusters, a frequency each sentence cluster appears in the second corpus;
based on the determined frequency, calculating, by one or more computer processors, a probability associated with each sentence cluster that appears in the second corpus, wherein the probability is a total number of sentence clusters in the second corpus divided by the determined frequency; and
based on the calculated probabilities, generating, by one or more computer processors, a first sentence model.
|