Method for detecting the quick restart of liveness daemons in a distributed multinode data processing system
First Claim
1. A method for facilitating correct group membership by detecting the quick restart of liveness daemons in a distributed, multimode data processing system in which nodes communicate liveness indicia in the form of heartbeat signals via adapters coupled to each node, said method comprising:
- experiencing at one node of a membership group a failure and a quick restart of the one node of a membership group;
wherein the failure and quick restart deletes locally stored membership group information at said one node;
subsequent to said failure and quick restart of the one node of a membership group, receiving a heartbeat signal at the one node from at least one other node of the membership group;
responsive to receipt of the heartbeat signal at the one node, sending, from the one node to the at least one other node, a first message which includes an indication the quick restart at the one node; and
determining, at the at least one other node, from said indication of the quick restart in the first message and from locally stored membership group information indicating prior membership of the one node in the membership group, the occurrence of a quick restart at said one node, and responding thereto by sending a second message from the at least one other node to another node of the membership group which indicates that said one node is to be expelled from the membership group;
wherein the quick restart at the one node occurs prior to a detection of the failure by at least one other node and expulsion of the one node from the membership group due to the failure.
1 Assignment
0 Petitions
Accused Products
Abstract
In distributed multinode data processing systems, mechanisms are employed to insure that the nodes are properly informed about the liveness of the other nodes in node groups in the network. In particular, the present invention employs group membership indicia as part of a mechanism for detecting that a node and/or its adapter have failed and have been recently restarted. Having detected this situation, group membership inconsistencies which it can engender are avoided.
52 Citations
18 Claims
-
1. A method for facilitating correct group membership by detecting the quick restart of liveness daemons in a distributed, multimode data processing system in which nodes communicate liveness indicia in the form of heartbeat signals via adapters coupled to each node, said method comprising:
-
experiencing at one node of a membership group a failure and a quick restart of the one node of a membership group;
wherein the failure and quick restart deletes locally stored membership group information at said one node;subsequent to said failure and quick restart of the one node of a membership group, receiving a heartbeat signal at the one node from at least one other node of the membership group; responsive to receipt of the heartbeat signal at the one node, sending, from the one node to the at least one other node, a first message which includes an indication the quick restart at the one node; and determining, at the at least one other node, from said indication of the quick restart in the first message and from locally stored membership group information indicating prior membership of the one node in the membership group, the occurrence of a quick restart at said one node, and responding thereto by sending a second message from the at least one other node to another node of the membership group which indicates that said one node is to be expelled from the membership group; wherein the quick restart at the one node occurs prior to a detection of the failure by at least one other node and expulsion of the one node from the membership group due to the failure. - View Dependent Claims (2, 3, 5, 6, 7)
-
-
4. A multimode data processing system comprising:
-
a plurality of data processing nodes connected in a network capable of transmitting messages between nodes; storage means within said nodes containing program code for experiencing at one node of a membership group, a failure and a quick restart of the one node of a membership group;
wherein the failure and quick restart deletes locally stored membership group information at said one node;subsequent to said failure and quick restart of the one node of a membership group, receiving a heartbeat signal at the one node from at least one other node of the membership group; responsive to receipt of the heartbeat signal at the one node, sending, from the one node to the at least one other node a first message which includes an indication of the quick restart at the one node; and determining, at the at least one other node, from said indication of the quick restart in the firs message and from locally stored membership group information indicating prior membership of the one node in the membership group, the occurrence of a quick restart at said one node, and responding thereto by sending a second message from the at least one other node to another node of the membership group which indicates that said one node is to be expelled from the membership group; wherein the quick restart at the one node occurs prior to a detection of the failure by at least one other node and expulsion of the one node from the membership group due to the failure. - View Dependent Claims (8, 9, 10, 11, 12, 18)
-
-
13. At least one program storage device readable by at least one computer, tangibly embodying at least one program of instructions executable by the at least one computer to perform a method of facilitating correct group membership by detecting quick restart of liveness daemons in a distributed, multinode data processing system in which nodes communicate liveness indicia in the form of heartbeat signals via adapters coupled to each other, said method comprising:
-
experiencing at one node of a membership group, a failure and a quick restart of the one node of a membership group;
wherein the failure and quick restart deletes locally stored membership group information at said one node;subsequent to said failure and quick restart of the one node of a membership group, receiving a heartbeat signal at the one node from at least one other node of the membership group; responsive to receipt of the heartbeat signal at the one node, sending, from the one node to the at least one other node, a first message which includes an indication of the quick restart at the one node; and determining at the at least one other node, from said indication of the quick restart in the first message and from locally stored membership group information indicating prior membership of the one node in the membership group, the occurrence of a quick restart at said one node, and responding thereto by sending a second message from the at least one other node to another node of the membership group which indicates that said one node is to be expelled from the membership group; wherein the quick restart at the one node occurs prior to a detection of the failure by at least one other node and expulsion of the one node from the membership group due to the failure. - View Dependent Claims (14, 15, 16, 17)
-
Specification