Inter-node communication scheme for node status sharing
First Claim
1. A method for determining node operating status among a cluster of nodes of a computer system, the method comprising:
- transmitting gossip messages directly between node pairs in the cluster of nodes, wherein the gossip messages contain an indication of operational status of other nodes in the cluster of nodes, wherein the other nodes are nodes other than the nodes in the node pairs;
receiving the gossip messages at the node pairs;
responsive to the receiving, at the nodes, updating operating status of other nodes according to the received gossip messages, wherein the status of a particular one of the other nodes is set to a non-operating status if the receiving has not received a gossip message from the particular node during a predetermined time period;
responsive to setting the status of the particular one of the other nodes to a non-operating status, transmitting a node down message indicating the non-operating status of the particular node to the other nodes in the cluster;
at a first node other than the particular node, receiving the node down message;
responsive to receiving the node down message, determining whether or not the first node has received a gossip message from the particular node during the predetermined time period; and
responsive to determining that the first node has received the gossip message from the particular node during the predetermined time period, transmitting a node alive message from the first node indicating that the particular node is operating.
1 Assignment
0 Petitions
Accused Products
Abstract
A gossiping scheme for sharing node status in a cluster of nodes provides a robust mechanism for determining node status within the cluster. Nodes transmit gossip messages to each other nodes, the gossip messages listing other nodes in the cluster that are operational. When a node does not receive a gossip message from a particular node within a predetermined time period, then the node transmits messages to the other nodes indicating that the particular node is down. However, if another node has received a packet from the particular node within the predetermined time period and receives the node down message, then the other node responds with a node alive message.
63 Citations
6 Claims
-
1. A method for determining node operating status among a cluster of nodes of a computer system, the method comprising:
-
transmitting gossip messages directly between node pairs in the cluster of nodes, wherein the gossip messages contain an indication of operational status of other nodes in the cluster of nodes, wherein the other nodes are nodes other than the nodes in the node pairs; receiving the gossip messages at the node pairs; responsive to the receiving, at the nodes, updating operating status of other nodes according to the received gossip messages, wherein the status of a particular one of the other nodes is set to a non-operating status if the receiving has not received a gossip message from the particular node during a predetermined time period; responsive to setting the status of the particular one of the other nodes to a non-operating status, transmitting a node down message indicating the non-operating status of the particular node to the other nodes in the cluster; at a first node other than the particular node, receiving the node down message; responsive to receiving the node down message, determining whether or not the first node has received a gossip message from the particular node during the predetermined time period; and responsive to determining that the first node has received the gossip message from the particular node during the predetermined time period, transmitting a node alive message from the first node indicating that the particular node is operating. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification