| US 7,451,210 B2 | ||
| Hybrid method for event prediction and system control | ||
| Manish Gupta, Yorktown Heights, N.Y. (US); Jose E. Moreira, Yorktown Heights, N.Y. (US); Adam J. Oliner, Cheshire, Conn. (US); and Ramendra K. Sahoo, Mohegan Lake, N.Y. (US) | ||
| Assigned to International Business Machines Corporation, Armonk, N.Y. (US) | ||
| Filed on Nov. 24, 2003, as Appl. No. 10/720,300. | ||
| Prior Publication US 2005/0114739 A1, May 26, 2005 | ||
| Int. Cl. G06F 15/173 (2006.01) | ||
| U.S. Cl. 709—224 [709/223; 709/226] | 1 Claim |

| 1. A method of predicting an occurrence of a critical event in a computer cluster having a plurality of nodes, said method
comprising steps of:
A) maintaining an event log comprising information concerning critical events that occur in the computer cluster, wherein
said critical events adversely affect performance of the cluster or one of its nodes, the maintaining step comprising:
i) aligning the information concerning the critical events;
ii) categorizing the information concerning the critical events according to time-dependency;
B) maintaining a system parameter log comprising information concerning system performance parameters for each node in the
cluster, the maintaining step comprising:
i) recording a temperature of the nodes in the cluster and a corresponding time value;
ii) recording a utilization parameter of a central processing unit of a node in the cluster and a corresponding time value;
C) filtering the event log and the system parameter log such that some critical event information and some system parameter
information is eliminated in order to reduce storage requirements of the cluster;
D) implementing a hybrid prediction system comprising rule based prediction algorithms, time-dependent variable prediction
algorithms, and a warning window;
wherein the rule-based prediction algorithms use associative rules based upon the critical event information and the system
parameter information for predicting a probable occurrence of the critical events within a specified time window and the variables
that are likely to indicate a potential occurrence of such an event;
wherein the time-dependent variable prediction algorithms generate time-series mathematical models that predict future values
of the system performance parameters; and
wherein the warning window is formed for only those nodes in the cluster in which at least one error has occurred in order
to reduce system requirements, wherein said warning window comprises a predicted performance parameter or critical event occurrence
for the node for a predetermined future period of time;
E) for only those nodes in which an error has occurred, loading the information from the event log and the system performance
information pertaining to said error-prone nodes from the system parameter log into a Bayesian network model representing
a correspondence between the system performance parameters and occurrence of the critical events;
F) using the Bayesian network model to predict a future critical event within a specified time-limit based upon the hybrid
prediction system;
G) making future scheduling and current data migration selections based upon the hybrid prediction system; and
H) adapting the Bayesian Network Model by feeding the scheduling and data migration selections into said Bayesian Network
Model.
|