Reliable fault resolution in a cluster
First Claim
1. A method for fault resolution in a computer system, comprising:
- configuring a cluster with a gateway for a network interface;
issuing an operating system ICMP echo to peer nodes in said cluster and to said gateway through said network interface in response to a heartbeat loss detection; and
analyzing a response from said echo to determine location of a fault in said cluster.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for localizing and resolving a fault in a cluster environment. The cluster is configured with at least one multi-homed node, and at least one gateway for each network interface. Heartbeat messages are sent between peer nodes and the gateway in predefined periodic intervals. In the event of loss of a heartbeat message by any node or gateway, an ICMP echo is issued to each node and gateway in the cluster for each network interface. If neither a node loss nor a network loss is validated in response to the ICMP echo, an application level ping is issued to determine if the fault associated with the absence of the heartbeat message is a transient error condition or an application software fault.
43 Citations
27 Claims
-
1. A method for fault resolution in a computer system, comprising:
-
configuring a cluster with a gateway for a network interface; issuing an operating system ICMP echo to peer nodes in said cluster and to said gateway through said network interface in response to a heartbeat loss detection; and analyzing a response from said echo to determine location of a fault in said cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A multi-node computer system, comprising:
-
a cluster with a gateway configured for a network interface; an operating system ICMP echo adapted to be issued to peer nodes in a cluster and to said gateway through said network interface in response to a heartbeat loss detection; and a response from said echo adapted to be analyzed for location of a fault in said cluster. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. An article comprising:
a computer-readable medium having computer-readable instructions stored thereon executable by a processor, said computer-readable instructions comprising; computer-readable instructions for issuing an operating system ICMP echo to a peer node in a cluster and to a configured cluster gateway through said network interface in response to heartbeat loss detection; and computer-readable instructions for analyzing a response message from said echo to determine location of a fault in said cluster. - View Dependent Claims (18, 19, 20, 21, 22)
-
23. A method for localizing a fault in a computer system, comprising:
-
sending periodic heartbeat messages to peer nodes in a network; issuing an operating system ICMP echo to said peers nodes and a gateway through a network interface in response to a heartbeat loss; and determining a location of a fault in said cluster through a response echo. - View Dependent Claims (24, 25, 26, 27)
-
Specification