Reliable fault resolution in a cluster
First Claim
1. A method for resolving a fault in a computer system comprising:
- determining a heartbeat loss in a cluster configured with a gateway for a network interface;
validating said heartbeat loss, including sending an ICMP echo to all peer nodes in said cluster and said gateway through said network interface; and
localizing said heartbeat loss including differentiating between a node fault and a network fault by analyzing a response echo.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for localizing and resolving a fault in a cluster environment. The cluster is configured with at least one multi-homed node, and at least one gateway for each network interface. Heartbeat messages are sent between peer nodes and the gateway in predefined periodic intervals. In the event of loss of a heartbeat message by any node or gateway, an ICMP echo is issued to each node and gateway in the cluster for each network interface. If neither a node loss not a network loss is validated in response to the ICMP echo, an application level ping is issued to determine if the fault associated with the absence of the heartbeat message is a transient error condition or an application software fault.
11 Citations
11 Claims
-
1. A method for resolving a fault in a computer system comprising:
-
determining a heartbeat loss in a cluster configured with a gateway for a network interface; validating said heartbeat loss, including sending an ICMP echo to all peer nodes in said cluster and said gateway through said network interface; and localizing said heartbeat loss including differentiating between a node fault and a network fault by analyzing a response echo. - View Dependent Claims (2)
-
-
3. An article comprising:
a computer-readable medium having computer-readable instructions stored thereon executable by a processor, said computer-readable instructions comprising; instructions to determine a heartbeat loss in a cluster configured with a gateway for a network interface; instructions to validate said heartbeat loss, including sending an ICMP echo to all peer nodes in said cluster and said gateway through said network interface; and instructions to localize said loss including differentiating between a node fault and a network fault through analysis of a response echo. - View Dependent Claims (4)
-
5. A computer system, comprising:
-
a cluster of nodes, each node having at least two network interfaces; a gateway in communication with said cluster and configured for a network interface; an operating system ICMP echo adapted to be issued to all peer nodes in said cluster and to said gateway through said network interface in response to a heartbeat loss detection; and a response from said echo adapted to be analyzed for location of a fault in said cluster including differentiating between a node fault and a network fault and determination of an intended recipient of said echo. - View Dependent Claims (6, 7, 8, 9, 10, 11)
-
Specification