Inter-node communication scheme for node status sharing
First Claim
1. A computer system comprising a processing cluster including a plurality of physical or virtual processing nodes, the computer system comprising at least one processor for executing program instructions and at least one memory coupled to the processor for executing the program instructions, wherein the program instructions are program instructions for determining node operating status among a cluster of the physical or virtual processing nodes, the program instructions comprising program instructions for:
- first transmitting gossip messages directly between node pairs in the cluster of nodes, wherein the gossip messages contain an indication of operational status of other nodes in the cluster of nodes, wherein the other nodes are nodes other than the nodes in the node pairs;
receiving the gossip messages at the node pairs;
responsive to the receiving, at the individual nodes, locally updating a local operating status of the other nodes according to the received gossip messages, wherein the local status kept by the individual nodes indicates the status of a particular one of the other nodes as a non-operating status if the receiving by the individual nodes has not received a gossip message from the particular one of the other nodes during a predetermined time period;
responsive to setting the local status of the particular one of the other nodes as kept by the individual nodes to a non-operating status, second transmitting a node down message indicating the non-operating status of the particular node to the other nodes in the cluster, andrepeating the first transmitting, receiving, updating and second transmitting at each of the nodes in the node pairs, so that the local status kept by each of the nodes reflects the status of each of the other nodes in the cluster.
2 Assignments
0 Petitions
Accused Products
Abstract
A gossiping scheme for sharing node status in a cluster of nodes provides a robust mechanism for determining node status within the cluster. Nodes transmit gossip messages to each other nodes, the gossip messages listing other nodes in the cluster that are operational. When a node does not receive a gossip message from a particular node within a predetermined time period, then the node transmits messages to the other nodes indicating that the particular node is down. However, if another node has received a packet from the particular node within the predetermined time period and receives the node down message, then the other node responds with a node alive message.
-
Citations
14 Claims
-
1. A computer system comprising a processing cluster including a plurality of physical or virtual processing nodes, the computer system comprising at least one processor for executing program instructions and at least one memory coupled to the processor for executing the program instructions, wherein the program instructions are program instructions for determining node operating status among a cluster of the physical or virtual processing nodes, the program instructions comprising program instructions for:
-
first transmitting gossip messages directly between node pairs in the cluster of nodes, wherein the gossip messages contain an indication of operational status of other nodes in the cluster of nodes, wherein the other nodes are nodes other than the nodes in the node pairs; receiving the gossip messages at the node pairs; responsive to the receiving, at the individual nodes, locally updating a local operating status of the other nodes according to the received gossip messages, wherein the local status kept by the individual nodes indicates the status of a particular one of the other nodes as a non-operating status if the receiving by the individual nodes has not received a gossip message from the particular one of the other nodes during a predetermined time period; responsive to setting the local status of the particular one of the other nodes as kept by the individual nodes to a non-operating status, second transmitting a node down message indicating the non-operating status of the particular node to the other nodes in the cluster, and repeating the first transmitting, receiving, updating and second transmitting at each of the nodes in the node pairs, so that the local status kept by each of the nodes reflects the status of each of the other nodes in the cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product comprising a computer-readable storage media that is not a signal, the storage media storing program instructions for execution within a computer system, the computer system comprising a processing cluster including a plurality of physical or virtual processing modes, wherein the program instructions are program instructions for determining node operating status among a cluster of the physical or virtual processing nodes, the program instructions comprising program instructions for:
-
first transmitting gossip messages directly between node pairs in the cluster of nodes, wherein the gossip messages contain an indication of operational status of other nodes in the cluster of nodes, wherein the other nodes are nodes other than the nodes in the node pairs; receiving the gossip messages at the node pairs; responsive to the receiving, at the individual nodes, locally updating a local operating status of the other nodes according to the received gossip messages, wherein the local status kept by the individual node indicates the status of a particular one of the other nodes as a non-operating status if the receiving by the individual nodes has not received a gossip message from the particular one of the other nodes during a predetermined time period; responsive to setting the local status of the particular one of the other nodes as kept by the individual nodes to a non-operating status, second transmitting a node down message indicating the non-operating status of the particular node to the other nodes in the cluster; and repeating the first transmitting, receiving, updating and second transmitting at each of the nodes in the node pairs, so that the local status kept by each of the nodes reflects the status of each of the other nodes in the cluster. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification