System and method for dynamic cluster adjustment to node failures in a distributed data system
First Claim
Patent Images
1. A method, comprising:
- a particular node detecting a node failure in a plurality of cluster nodes connected together to form a distributed data cluster having a topology order;
the particular node updating local topology data after said detecting to reflect the node failure;
the particular node determining between the particular node'"'"'s previous node and the particular node'"'"'s next node as to which one is the failed node;
if the failed node corresponding to the node failure is the particular node'"'"'s previous node, the particular node initiating a sequential propagation of a node dead message to other nodes of the distributed data cluster according to a first sequential ordering of nodes, wherein the first sequential ordering comprises the particular node followed by the next node of the particular node; and
if the failed node is the particular node'"'"'s next node,the particular node initiating a sequential propagation of a node dead message to other nodes of the distributed data cluster according to a second sequential ordering of nodes, wherein the second sequential ordering comprises the particular node followed by the previous node of the particular node; and
the particular node transitioning to a reconnecting state to begin reconnecting to a new next node.
2 Assignments
0 Petitions
Accused Products
Abstract
A distributed system provides for separate management of dynamic cluster membership and distributed data. Nodes of the distributed system may include a state manager and a topology manager. A state manager handles data access from the cluster. A topology manager handles changes to the dynamic cluster topology. The topology manager enables operation of the state manager by handling topology changes, such as new nodes to join the cluster and node members to exit the cluster. A topology manager may follow a static topology description when handling cluster topology changes. Data replication and recovery functions may be implemented, for example to provide high availability.
187 Citations
26 Claims
-
1. A method, comprising:
-
a particular node detecting a node failure in a plurality of cluster nodes connected together to form a distributed data cluster having a topology order; the particular node updating local topology data after said detecting to reflect the node failure; the particular node determining between the particular node'"'"'s previous node and the particular node'"'"'s next node as to which one is the failed node; if the failed node corresponding to the node failure is the particular node'"'"'s previous node, the particular node initiating a sequential propagation of a node dead message to other nodes of the distributed data cluster according to a first sequential ordering of nodes, wherein the first sequential ordering comprises the particular node followed by the next node of the particular node; and if the failed node is the particular node'"'"'s next node, the particular node initiating a sequential propagation of a node dead message to other nodes of the distributed data cluster according to a second sequential ordering of nodes, wherein the second sequential ordering comprises the particular node followed by the previous node of the particular node; and the particular node transitioning to a reconnecting state to begin reconnecting to a new next node. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method, comprising:
-
a particular node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order; the particular node updating local topology data after said receiving to reflect topology data included in the node dead message; the particular node determining between its previous node and its next node as to which one sent the node dead message; if its previous node sent the node dead message, the particular node initiating a sequential propagation of a node dead message to other nodes of the distributed data cluster according to a first sequential ordering of nodes, wherein the first sequential ordering comprises the particular node followed by the next node of the particular node; and if its next node sent the node dead message, the particular node initiating a sequential propagation of a node dead message to other nodes of the distributed data cluster according to a second sequential ordering of nodes, wherein the second sequential ordering comprises the particular node followed by the previous node of the particular node. - View Dependent Claims (9, 10)
-
-
11. A method, comprising:
-
a particular node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order, wherein the node dead message includes topology data identifying a given node as a failed node; if the given node is the particular node'"'"'s previous node, the particular node checking whether the previous node has failed, and if the previous node has not failed, the particular node sending a connect reject message to one of the plurality of cluster nodes, wherein the connect reject message indicates that a failure was incorrectly declared; if the given node is the particular node'"'"'s next node, the particular node checking whether the next node has failed and if the next node has not failed, the particular node sending a connect reject message to one of the plurality of cluster nodes, wherein the connect reject message indicates that a failure was incorrectly declared; and otherwise, the particular node updating local topology data to reflect a node failure.
-
-
12. A computer system, comprising a processor and memory including instructions executable by the processor for:
-
a particular node detecting a node failure in a plurality of cluster nodes connected together to form a distributed data cluster having a topology order; the particular node updating local topology data after said detecting to reflect the node failure; the particular node determining between the particular node'"'"'s previous node and the particular node'"'"'s next node as to which one is the failed node; if the failed node corresponding to the node failure is the particular node'"'"'s previous node, the particular node initiating a sequential propagation of a node dead message to other nodes of the distributed data cluster according to a first sequential ordering of nodes, wherein the first sequential ordering comprises the particular node followed by the next node of the particular node; and if the failed node is the particular node'"'"'s next node, the particular node initiating a sequential propagation of a node dead message to other nodes of the distributed data cluster according to a second sequential ordering of nodes, wherein the second sequential ordering comprises the particular node followed by the previous node of the particular node; and the particular node transitioning to a reconnecting state to begin reconnecting to a new next node. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A computer system, comprising a processor and memory including instructions executable by the processor for:
-
a particular node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order; the particular node updating local topology data after said receiving to reflect topology data included in the node dead message; the particular node determining between its previous node and its next node as to which one sent the node dead message; if its previous node sent the node dead message, the particular node continuing a sequential propagation of the node dead message to other nodes of the distributed data cluster according to a first sequential ordering of nodes, wherein the first sequential ordering comprises the particular node followed by the next node of the particular node; and if its next node sent the node dead message, the particular node continuing a sequential propagation of the node dead message to other nodes of the distributed data cluster according to a second sequential ordering of nodes, wherein the second sequential ordering comprises the particular node followed by the previous node of the particular node. - View Dependent Claims (20, 21)
-
-
22. A computer system comprising a processor and memory including instructions executable by the processor for:
-
a particular node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order, wherein the node dead message includes topology data identifying a given node as a failed node; if the given node is the particular node'"'"'s previous node, the particular node checking whether the previous node has failed, and if the previous node has not failed, the particular node sending a connect reject message to one of the plurality of cluster nodes, wherein the connect reject message indicates that a failure was incorrectly declared; and if the given node is the particular node'"'"'s next node, the particular node checking whether the next node has failed, and if the next node has not failed, the particular node sending a connect reject message to one of the plurality of cluster nodes, wherein the connect reject message indicates that a failure was incorrectly declared.
-
-
23. A method, comprising:
-
a first node and a second node detecting a node failure in a plurality of cluster nodes connected together to form a distributed data cluster having a topology order, wherein the first node is a failed node'"'"'s previous node and the second node is the failed node'"'"'s next node; the first node and the second updating local topology data after said detecting to reflect the node failure; the first node sending a node dead message to its previous node and transitioning to a reconnecting state to begin reconnecting to the second node; the second node sending a node dead message to its next node; the first node in the reconnecting state connecting to the second node; after said connecting the first node transitioning to a joining state and sending a connect request message to the second node; the first node waiting in the joining state to receive a connect complete message; the second node receiving the connect request message from the first node; after receiving the connect request message, the second node transitioning to a transient state and sending a node joined message to its next node including data indicating that the first node as the second node'"'"'s previous node; the second node waiting in the transient state to receive a connect complete message from the first node; the first node'"'"'s previous node receiving the node joined message and sending the first node a connect complete message; upon receiving the connect complete message from its previous node, the first node sending a connect complete message to the second node and transitioning to a joined state as a member of the distributed data cluster; and upon receiving the connect complete message from the first node, the second node transitioning to the joined state wherein the second node is connected to the first node as its previous node in the cluster topology order; wherein in the joined state the first node is a member of the distributed data cluster in the topology order between its previous node and its next node.
-
-
24. A computer-readable storage medium, comprising program instructions, wherein the instructions are computer-executable to:
-
at a particular node, detect a node failure in a plurality of cluster nodes connected together to form a distributed data cluster having a topology order; update local topology data at the particular node to reflect the node failure; determine between the particular node'"'"'s previous node and the particular node'"'"'s next node as to which one is the failed node; if the failed node corresponding to the node failure is the particular node'"'"'s previous node, initiate a sequential propagation of a node dead message from the particular node to other nodes of the distributed data cluster according to a first sequential ordering of nodes, wherein the first sequential ordering comprises the particular node followed by the next node of the particular node; and if the failed node is the particular node'"'"'s next node, initiate a sequential propagation of a node dead message from the particular node to other nodes of the distributed data cluster according to a second sequential ordering of nodes, wherein the second sequential ordering comprises the particular node followed by the next node of the particular node.
-
-
25. A method, comprising:
-
a particular node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order, and wherein the node dead message includes data indicating the one of the plurality of cluster nodes that sent the node dead message to the particular node; the particular node updating local topology data after said receiving to reflect topology data included in the node dead message; the particular node determining between its previous node and its next node as to which one sent the node dead message; if its previous node sent the node dead message, the particular node sending a node dead message to the next node of the particular node; and if its next node sent the node dead message, the particular node sending a node dead message to the previous node of the particular node.
-
-
26. A computer system, comprising a processor and memory including instructions executable by the processor for:
-
a particular node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order, and wherein the node dead message includes data indicating the one of the plurality of cluster nodes that sent the node dead message to the particular node; the particular node updating local topology data after said receiving to reflect topology data included in the node dead message; the particular node determining between its previous node and its next node as to which one sent the node dead message; if its previous node sent the node dead message, the particular node sending a node dead message to the next node of the particular node; and if its next node sent the node dead message, the particular node sending a node dead message to the previous node of the particular node.
-
Specification