Managing faults in a high availability system
First Claim
1. A method of managing a failure of a critical high availability (HA) component, the method comprising the steps of:
- a computer identifying a plurality of critical HA components of a HA system;
the computer receiving and assigning categories to the plurality of critical HA components;
the computer receiving and assigning weights to the categories;
the computer obtaining a current value indicating a performance of a critical HA component included in the plurality of critical HA components, the current value obtained by periodically monitoring the plurality of critical HA components;
the computer receiving a reference value for the performance of the critical HA component;
the computer determining a deviation between the current value and the reference value;
based on the deviation, the computer determining the critical HA component has failed; and
based in part on the failed critical HA component, the categories, and the weights, the computer determining a health index in real-time, the health index indicating in part how much the critical HA component having failed affects a measure of health of the HA system.
2 Assignments
0 Petitions
Accused Products
Abstract
An approach is provided for managing a failure of a critical high availability (HA) component in a HA system. Critical HA components are identified. Categories are assigned to the identified components and weights are assigned to the categories. A current value indicating a performance of a component included in the identified components is obtained by periodically monitoring the components. A reference value for the performance of the component is received. A deviation between the current value and the reference value is determined. Based on the deviation, the component is determined to have failed. Based in part on the failed component, the categories, and the weights, a health index is determined in real-time. The health index indicates in part how much the component having failed affects a measure of health of the HA system.
15 Citations
14 Claims
-
1. A method of managing a failure of a critical high availability (HA) component, the method comprising the steps of:
-
a computer identifying a plurality of critical HA components of a HA system; the computer receiving and assigning categories to the plurality of critical HA components; the computer receiving and assigning weights to the categories; the computer obtaining a current value indicating a performance of a critical HA component included in the plurality of critical HA components, the current value obtained by periodically monitoring the plurality of critical HA components; the computer receiving a reference value for the performance of the critical HA component; the computer determining a deviation between the current value and the reference value; based on the deviation, the computer determining the critical HA component has failed; and based in part on the failed critical HA component, the categories, and the weights, the computer determining a health index in real-time, the health index indicating in part how much the critical HA component having failed affects a measure of health of the HA system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product comprising:
-
a computer-readable, tangible storage device; and a computer-readable program code stored in the computer-readable, tangible storage device, the computer-readable program code containing instructions that are executed by a central processing unit (CPU) of a computer system to implement a method of managing a failure of a critical high availability (HA) component, the method comprising the steps of; the computer system identifying a plurality of critical HA components of a HA system; the computer system receiving and assigning categories to the plurality of critical HA components; the computer system receiving and assigning weights to the categories; the computer system obtaining a current value indicating a performance of a critical HA component included in the plurality of critical HA components, the current value obtained by periodically monitoring the plurality of critical HA components; the computer system receiving a reference value for the performance of the critical HA component; the computer system determining a deviation between the current value and the reference value; based on the deviation, the computer system determining the critical HA component has failed; and based in part on the failed critical HA component, the categories, and the weights, the computer system determining a health index in real-time, the health index indicating in part how much the critical HA component having failed affects a measure of health of the HA system. - View Dependent Claims (10, 11, 12, 13, 14)
-
Specification