CPC G06F 40/30 (2020.01) [G06F 17/18 (2013.01); G06F 18/214 (2023.01); G06F 18/2431 (2023.01); G06F 18/251 (2023.01); G06F 40/151 (2020.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06N 20/10 (2019.01); G06V 10/40 (2022.01); G06V 10/764 (2022.01); G06V 10/803 (2022.01); G06V 10/82 (2022.01); G06N 20/00 (2019.01)] | 28 Claims |
1. A method comprising:
generating, by a server, textual embeddings by tokenizing text data and generating vectors to be provided to a transformer system, wherein the textual embeddings are vector representations of semantic meanings of text that is part of the text data;
averaging, by the server, the vectors for every token of the generated textual embeddings and concatenating average output activations of two layers of the transformer system;
generating, by the server, image embeddings from image data, wherein the image embeddings are vector representations of the images that are part of the image data;
combining, by the server, the textual embeddings and image embeddings to form combined embeddings to be provided to the transformer system; and
transmitting, by the server, the combined embeddings.
|