Method for determination of remote adapter and/or node liveness
First Claim
1. A method for determining node status in a network of connected data processing nodes, said method comprising the steps of:
- periodically sending, from one of said nodes, to at least one other node in said network, a first message, said first message being directed to a daemon program running on said at least one other node;
determining, at said at least one other node, that a specified number of first message transmissions have not been received;
sending a second message to said one node from said at least one other node, said second message being directed to an other program, running on said one node that is less susceptible than said daemon to being delayed in providing a response.
1 Assignment
0 Petitions
Accused Products
Abstract
The determination of node and/or adapter liveness in a distributed network data processing system is carried out via one messaging protocol that can be assisted by a second messaging protocol which is significantly less susceptible to delay, especially memory blocking delays encountered by daemons running on other nodes. The switching of protocols is accompanied by controlled grace periods for needed responses. This messaging protocol flexibility is also adapted for use as a mechanism for controlling the deliberate activities of node addition (birth) and node deletion (death).
60 Citations
18 Claims
-
1. A method for determining node status in a network of connected data processing nodes, said method comprising the steps of:
-
periodically sending, from one of said nodes, to at least one other node in said network, a first message, said first message being directed to a daemon program running on said at least one other node;
determining, at said at least one other node, that a specified number of first message transmissions have not been received;
sending a second message to said one node from said at least one other node, said second message being directed to an other program, running on said one node that is less susceptible than said daemon to being delayed in providing a response. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for adding new members to a group of nodes in a connected data processing network, said method comprising the steps of:
-
periodically sending, from a first group leader node a proclaim message to select nodes in the network configuration but which are not currently part of the group;
responding to said proclaim message from said first group leader node with a join message from a group leader of an other group having said lower network address, said join message including a membership list for any joining groups;
transmitting a prepare-to-commit message to said nodes within any of said joining groups;
receiving, at said first group leader node, an acknowledgment of said prepare to commit message, from at least one node;
sending a commit message with updated membership list to all nodes on the updated list;
following said periodic sending, determining that a specified number of said prepare-to-commit message transmissions have not been received from a potentially failed node; and
sending an echo-request message to said potentially failed node, said echo request message being directed to an other program running on said potentially failed node that is less susceptible than said prepare-to-commit message to being delayed. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. An apparatus for data processing comprising:
-
a connected network of data processing nodes having operating systems for controlling said nodes together with programs at each node for controlling the interconnection of said nodes into groups of nodes;
first program means within one of said nodes for periodically sending from said one node, to at least one other node in said network, a first message which is also expected to be periodically received at said at least one other node, said first message being sent by a daemon program running on said one node; and
second program means within said at least one other node for sending a second message to said one node after having failed to receive said first message within a certain number of said periods, said second message being directed to an other program running on said one node, said other program being less susceptible than said first program means to being delayed.
-
-
18. A computer program product stored within or on a machine readable medium containing program means for use in an interconnected network of data processing nodes said program means being operative
to periodically send, from one of said nodes, to at least one other node in said network, a first message, said first message being directed to a daemon program running on said at least one other node; -
to determine, at said at least one other node, that a specified number of first message transmissions have not been received; and
thento send a second message to said one node, said second message being directed to an other program, running on said one other node that is less susceptible than said daemon program to being delayed.
-
Specification