Method, system and program product for detecting an operational risk of a node
First Claim
Patent Images
1. A method for detecting an operational risk of a node, comprising:
- providing a plurality of nodes within an existing system, wherein each of the plurality of nodes is configured to perform similarly with respect to a set of operational aspects, wherein operational aspects include at least one of the following;
average and peak CPU load, an average and peak I/O response time, or average and peak response time to classes of transactions;
monitoring a performance of each individual node of the plurality of nodes, wherein the monitored performance of each individual node is determined by measuring a performance value of at least one of the set of operational aspects, and wherein at least one of the operational aspects that is monitored are limited to the individual node, wherein monitoring the performance of each individual node includes providing an operational report for each of the plurality of nodes, each operational report including the measured set of performance values and an identifier for a corresponding node;
comparing the monitored individual performance of each individual node of the plurality of nodes with the monitored individual performance of a different node of the plurality of nodes, wherein the comparing is between two individual nodes and both individual nodes are within the plurality of nodes; and
detecting an operational risk if the individual monitored performance of one of the plurality of nodes varies from the individual monitored performances of a different node of the plurality of nodes by more than a current tolerance, wherein the current tolerance is initially defined as a static administrator-set tolerance and is subsequently updated with a dynamic tolerance that is computed by an operations system based on a historical trend of the monitored performance of the plurality of nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
Under the present invention, the performances of a plurality of similarly configured nodes are monitored and compared. If one of the nodes exhibits a performance that varies from the performances of the other nodes by more than a current tolerance, an operational risk is detected. If detected, an alert can be generated and one or more corrective actions implemented to address the operational risk.
-
Citations
8 Claims
-
1. A method for detecting an operational risk of a node, comprising:
-
providing a plurality of nodes within an existing system, wherein each of the plurality of nodes is configured to perform similarly with respect to a set of operational aspects, wherein operational aspects include at least one of the following;
average and peak CPU load, an average and peak I/O response time, or average and peak response time to classes of transactions;monitoring a performance of each individual node of the plurality of nodes, wherein the monitored performance of each individual node is determined by measuring a performance value of at least one of the set of operational aspects, and wherein at least one of the operational aspects that is monitored are limited to the individual node, wherein monitoring the performance of each individual node includes providing an operational report for each of the plurality of nodes, each operational report including the measured set of performance values and an identifier for a corresponding node; comparing the monitored individual performance of each individual node of the plurality of nodes with the monitored individual performance of a different node of the plurality of nodes, wherein the comparing is between two individual nodes and both individual nodes are within the plurality of nodes; and detecting an operational risk if the individual monitored performance of one of the plurality of nodes varies from the individual monitored performances of a different node of the plurality of nodes by more than a current tolerance, wherein the current tolerance is initially defined as a static administrator-set tolerance and is subsequently updated with a dynamic tolerance that is computed by an operations system based on a historical trend of the monitored performance of the plurality of nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification