Hybrid method for event prediction and system control
First Claim
1. A method of predicting the occurrence of critical events in a computer cluster having a series of nodes, said method comprising:
- maintaining an event log that contains information concerning critical events that have occurred in the computer cluster;
maintaining a system parameter log that contains information concerning system parameters for each node in the cluster; and
predicting a future performance of a node in the cluster based upon said event log and said system parameter log.
3 Assignments
0 Petitions
Accused Products
Abstract
A hybrid method of predicting the occurrence of future critical events in a computer cluster having a series of nodes records system performance parameters and the occurrence of past critical events. A data filter filters the logged to data to eliminate redundancies and decrease the data storage requirements of the system. Time-series models and rule based classification schemes are used to associate various system parameters with the past occurrence of critical events and predict the occurrence of future critical events. Ongoing processing jobs are migrated to nodes for which no critical events are predicted and future jobs are routed to more robust nodes.
69 Citations
21 Claims
-
1. A method of predicting the occurrence of critical events in a computer cluster having a series of nodes, said method comprising:
-
maintaining an event log that contains information concerning critical events that have occurred in the computer cluster;
maintaining a system parameter log that contains information concerning system parameters for each node in the cluster; and
predicting a future performance of a node in the cluster based upon said event log and said system parameter log. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of improving the performance of a computer cluster having a series of nodes comprising:
-
monitoring the occurrence of critical events in said nodes in said computer cluster;
monitoring system performance parameters of said nodes in said computer cluster;
creating a node representation for each node in said computer cluster based upon said monitoring;
creating a cluster representation based on said node representations;
periodically examining said node representations to predict future node performance; and
using said cluster representation to redistribute tasks among said nodes based upon said predicted node performance. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. An information processing system comprising:
-
a computer cluster having a series of nodes;
a control system for monitoring critical events that occur in said computer cluster and system parameters of said nodes;
a memory for storing information related to said occurrence of said critical events and said system parameters of said nodes; and
a Bayesian Network model for predicting a future occurrence of a critical event based upon an observed relationship between said system parameters and said occurrence of critical events. - View Dependent Claims (18, 19, 20, 21)
-
Specification