US 11,816,586 B2
	Event identification through machine learning
Xue Feng Gao, Beijing (CN); Hui Qing Shi, Beijing (CN); James C. Thorburn, Toronto (CA); Yu Fen Yuan, Beijing (CN); and Qing Feng Zhang, Beijing (CN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Nov. 13, 2017, as Appl. No. 15/811,573.
Prior Publication US 2019/0147354 A1, May 16, 2019
Int. Cl. G06N 5/045 (2023.01); G06N 20/00 (2019.01); G06N 7/01 (2023.01)

CPC G06N 5/045 (2013.01) [G06N 7/01 (2023.01); G06N 20/00 (2019.01)]

17 Claims

1. A method for event identification comprising:

receiving event information pertaining to events occurring with respect to a cloud computing environment over a network, each occurring event having a measurement metric, the measurement metric for each occurring event including a value attribute, a change attribute, a streak size attribute and a streak duration attribute wherein the value attribute is the original series of measurement data over one or more measurement periods, the change attribute is the change of value at a current measurement period relative to a previous measurement period, the streak size attribute is the size of continuous change in one direction as positive, negative or flat and the streak duration attribute is the number of measurement periods of continuous change in one direction as positive, negative or flat;

evaluating by a probability function the measurement metric for each occurring event to determine when any of the value attribute, the change attribute, the streak size attribute and the streak duration attribute is above a predetermined probability threshold or below the probability threshold, wherein the probability threshold is dynamically determined based on a historical distribution of measurement metric data, and wherein above the probability threshold or below the probability threshold is classified as alarm data;

training a decision tree by a training process using training data comprising:

obtaining the training data, wherein the training data includes a plurality of event information that triggers an alarm regardless of whether the alarm is above the predetermined probability threshold or below the probability threshold, the plurality of event information that triggers the alarm having an indication of whether the event information that triggered the alarm was a significant alarmed event;

creating a root node using the training data;

finding a splitting point of the value attribute, the change attribute, the streak size attribute and the streak duration attribute by determining a probability ratio of the significant alarmed event and an insignificant alarmed event for a subset of the value attribute, the change attribute, the streak size attribute and the streak duration attribute;

comparing the probability ratio of the value attribute, the change attribute, the streak size attribute and the streak duration attribute and choosing an attribute from the compared attributes as a split node that results in a maximum probability ratio between the significant alarmed event and the insignificant alarmed event;

splitting the root node using the chosen attribute as the split node and new node to the decision tree; and

repeating finding the splitting point, comparing the probability ratio, and splitting the splitting node until all nodes terminate splitting;

processing the alarm data through the decision tree after the training process has been performed to determine based on the training data when the alarm data is significant or when the alarm data is not significant and to reduce the number of alarm data to a predetermined number of significant alarm data, wherein the alarm data that is classified as significant requires attention;

displaying the predetermined number of significant alarm data to a user;

retraining the decision tree with a second set of training data based on feedback received from the user; and

adjusting the probability threshold based in part on the second set of training data.