US 11,704,490 B2
Log sourcetype inference model training for a data intake and query system
Ram Sriharsha, Oakland, CA (US); Zhaohui Wang, Walnut Creek, CA (US); and Kristal Curtis, San Francisco, CA (US)
Assigned to Splunk Inc., San Francisco, CA (US)
Filed by Splunk Inc., San Francisco, CA (US)
Filed on Jul. 31, 2020, as Appl. No. 16/945,448.
Prior Publication US 2022/0036002 A1, Feb. 3, 2022
Int. Cl. G06F 40/284 (2020.01); G06N 20/00 (2019.01); G06F 40/242 (2020.01); G06F 16/33 (2019.01); G06N 5/04 (2023.01)
CPC G06F 40/284 (2020.01) [G06F 16/3347 (2019.01); G06F 40/242 (2020.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
obtaining a log generated by one or more components in an information technology environment;
generating one or more tokens based on text in the log;
filtering the one or more tokens based on a language dictionary to identify a subset of the one or more tokens;
converting the subset of the one or more tokens into a vector that is labeled with an indication of a first log sourcetype; and
training, using the vector, a machine learning model to predict whether a log sourcetype of a log applied to the trained machine learning model as an input is the first log sourcetype.