Autonomous System State Tolerance Adjustment for Autonomous Management Systems
First Claim
Patent Images
1. A method comprising:
- receiving monitoring data from a plurality of application nodes interconnected via a communication network;
receiving a failure notification from an autonomic management system when the autonomic management system perceives a failure of a first application node based on a fault tolerance parameter;
executing an autonomous system status manager to analyze the monitoring data from the application nodes to determine whether the perceived failure of the first application node is genuine; and
autonomically adjusting the fault tolerance parameter with the autonomous system status manager when the perceived failure of the first application is determined not to be genuine.
3 Assignments
0 Petitions
Accused Products
Abstract
In general, the techniques of this invention are directed to determining whether a component failure in a distributed computing system is genuine. In particular, embodiments of this invention analyze monitoring data from other application nodes in a distributed computing system to determine whether the component failure is genuine. If the component failure is not genuine, the embodiments may adjust a fault tolerance parameter that caused the component failure to be perceived.
24 Citations
17 Claims
-
1. A method comprising:
-
receiving monitoring data from a plurality of application nodes interconnected via a communication network; receiving a failure notification from an autonomic management system when the autonomic management system perceives a failure of a first application node based on a fault tolerance parameter; executing an autonomous system status manager to analyze the monitoring data from the application nodes to determine whether the perceived failure of the first application node is genuine; and autonomically adjusting the fault tolerance parameter with the autonomous system status manager when the perceived failure of the first application is determined not to be genuine. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A distributed computing system comprising:
-
a plurality of application nodes interconnected via a communications network; an autonomic management system to provide autonomic control of the application nodes, wherein the autonomic management system monitors the application nodes to perceive a failure of a first node of the application nodes based on a fault tolerance parameter; and a system status manager to autonomously adjust the fault tolerance parameter based on an analysis of a state of all of the application nodes. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable medium comprising instructions, the instruction causing a processor to:
-
receive monitoring data from a plurality of application nodes interconnected via a communication network; receive a failure notification from an autonomic management system when the autonomic management system perceives a failure of a first application node based on a fault tolerance parameter; and analyze the monitoring data from all of the application nodes to determine whether the perceived failure of the first application node is genuine; and adjust the fault tolerance parameter when the perceived failure of the first application is not genuine.
-
Specification