Dynamic rate heartbeating for inter-node status updating
First Claim
1. A computer system comprising:
- a processing cluster including a plurality of physical or virtual processing nodes, the computer system comprising at least one processor for executing program instructions; and
at least one memory coupled to the processor for executing the program instructions, wherein the program instructions are program instructions for determining node operating status among a cluster of the physical or virtual processing nodes, the program instructions comprisingprogram instructions for transmitting messages periodically according to a heartbeat rate among nodes in the cluster of nodes;
program instructions for receiving the messages at the nodes and storing indications of communications delays for the messages determined from times of the receiving of the messages at the nodes;
program instructions for computing statistics of the communications delay for the messages from the indications of communications delay;
program instructions for adjusting parameters for node status monitoring according to the computed statistics; and
program instructions for monitoring, at the nodes, the operational status of other ones of the nodes according to the indications of communications delay and the parameters.
1 Assignment
0 Petitions
Accused Products
Abstract
A scheme for monitoring node operational status according to communications transmits messages periodically according to a heartbeat rate among the nodes. The messages may be gossip messages containing the status of the other nodes in the pairs, are received at the nodes and indications of the communications delays of the received messages are stored, which are used to compute statistics of the stored communications delays. Parameters of the node status monitoring, which are used for determining operational status of the nodes, are adjusted according to the statistics, which may include adjusting the heartbeat rate, the maximum wait time before a message is considered missed, and/or the maximum number of missed messages, e.g., the sequence number deviation, before the node is considered non-operational (down).
48 Citations
14 Claims
-
1. A computer system comprising:
-
a processing cluster including a plurality of physical or virtual processing nodes, the computer system comprising at least one processor for executing program instructions; and at least one memory coupled to the processor for executing the program instructions, wherein the program instructions are program instructions for determining node operating status among a cluster of the physical or virtual processing nodes, the program instructions comprising program instructions for transmitting messages periodically according to a heartbeat rate among nodes in the cluster of nodes; program instructions for receiving the messages at the nodes and storing indications of communications delays for the messages determined from times of the receiving of the messages at the nodes; program instructions for computing statistics of the communications delay for the messages from the indications of communications delay; program instructions for adjusting parameters for node status monitoring according to the computed statistics; and program instructions for monitoring, at the nodes, the operational status of other ones of the nodes according to the indications of communications delay and the parameters. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product comprising a non-transitory computer-readable storage device storing program instructions for execution within a computer system, the computer system comprising a processing cluster including a plurality of physical or virtual processing modes, wherein the program instructions are program instructions for determining node operating status among a cluster of the physical or virtual processing nodes, the program instructions comprising program instructions for:
-
transmitting messages periodically according to a heartbeat rate among nodes in the cluster of nodes; receiving the messages at the nodes and storing indications of communications delays for the messages determined from times of the receiving of the messages at the nodes; computing statistics of the communications delay for the messages from the indications of communications delay; adjusting parameters for node status monitoring according to the computed statistics; and monitoring, at the nodes, the operational status of other ones of the nodes according to the indications of communications delay and the parameters. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification