System and method for dynamic cluster adjustment to node failures in a distributed data system

US 20030204786A1
Filed: 04/29/2002
Published: 10/30/2003
Est. Priority Date: 04/29/2002
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

a node detecting a node failure in a plurality of cluster nodes connected together to form a distributed data cluster having a topology order;

the node updating local topology data after said detecting to reflect the node failure;

if the failed node is the node'"'"'s previous node, the node sending a node dead message to its next node; and

if the failed node is the node'"'"'s next node, the node sending a node dead message to its previous node and transitioning to reconnecting state to being reconnecting to a new next node.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A distributed system provides for separate management of dynamic cluster membership and distributed data. Nodes of the distributed system may include a state manager and a topology manager. A state manager handles data access from the cluster. A topology manager handles changes to the dynamic cluster topology. The topology manager enables operation of the state manager by handling topology changes, such as new nodes to join the cluster and node members to exit the cluster. A topology manager may follow a static topology description when handling cluster topology changes. Data replication and recovery functions may be implemented, for example to provide high availability.

Citations

23 Claims

1. A method, comprising:
- a node detecting a node failure in a plurality of cluster nodes connected together to form a distributed data cluster having a topology order;
  
  the node updating local topology data after said detecting to reflect the node failure;
  
  if the failed node is the node'"'"'s previous node, the node sending a node dead message to its next node; and
  
  if the failed node is the node'"'"'s next node, the node sending a node dead message to its previous node and transitioning to reconnecting state to being reconnecting to a new next node.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method as recited in claim 1, further comprising:
    - the node in reconnecting state attempting to connect to one of the plurality of cluster nodes as its next node;
      
      upon said connecting, the node transitioning to a joining state and sending a connect request message to its next node; and
      
      the node waiting in the joining state to receive a connect complete message.
  - 3. The method as recited in claim 2, further comprising upon receiving the connect complete message from one of the plurality of cluster nodes connected to the node as its next node, the node transitioning to a joined state configured to operate as a member of the distributed data cluster, wherein in the joined state the node is a member of the distributed data cluster in the topology order between its previous node and its next node.
  - 4. The method as recited in claim 2, further comprising:
    - after sending the connect request message the node receiving a connect reject message from one of the plurality of cluster nodes, wherein the connect reject message includes data indicating a designated node;
      
      after receiving the connect reject message the node transitioning to reconnecting state, connecting to the designated node, and sending a connect request message to the designated node; and
      
      the node waiting in the joining state to receive a connect complete message.
  - 5. The method as recited in claim 1, further comprising the node sending a node ping to its previous node and failing to receive the ping message from its next node before said updating local topology data.
  - 6. The method as recited in claim 1, further comprising:
    - the node receiving a connect request message from a cluster node in the distributed data cluster, wherein the node'"'"'s previous node is the failed node and the cluster node is the previous node of the failed node;
      
      after receiving the connect request message, the node transitioning to a transient state and sending a node joined message to its next node including topology data indicating the cluster node as its previous node;
      
      the node waiting in the transient state to receive a connect complete message from the cluster node; and
      
      upon receiving the connect complete message from the cluster node, the node transitioning to the joined state wherein the node is connected to the cluster node as its previous node in the cluster topology order.
  - 7. The method as recited in claim 1, wherein the connect complete message includes data indicating each node in the plurality of cluster nodes.

8. A method, comprising:
- a node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order;
  
  the node updating local topology data after said receiving to reflect topology data included in the node dead message;
  
  if its previous node sent the node dead message, the node sending a node dead message to its next node; and
  
  if its next node sent the node dead message, the node sending a node dead message to its previous node.
- View Dependent Claims (9, 10)
- - 9. The method as recited in claim 8, wherein the node dead message include data indicating the one of the plurality of cluster nodes that sent the node dead message to the node.
  - 10. The method as recited in claim 8, wherein the node appends data identifying the node to the node dead message before said sending the node dead message.

11. A method, comprising:
- a node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order, wherein the node dead message include topology data indicating a failed node;
  
  if the failed node is the node'"'"'s previous node, the node verifying that its previous node has failed and if its previous node is active sending a connect reject message to one of the plurality of cluster nodes; and
  
  if the failed node is the node'"'"'s next node, the node verifying that its next node has failed and if its next node is active sending a connect reject message to one of the plurality of cluster nodes. otherwise the node updating local topology data to reflect the node failure.

12. A computer system comprising a processor and memory including instructions executable by the processor for:
- a node detecting a node failure in a plurality of cluster nodes connected together to form a distributed data cluster having a topology order;
  
  the node updating local topology data after said detecting to reflect the node failure;
  
  if the failed node is the node'"'"'s previous node, the node sending a node dead message to its next node; and
  
  if the failed node is the node'"'"'s next node, the node sending a node dead message to its previous node and transitioning to reconnecting state to being reconnecting to a new next node.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The computer system as recited in claim 12, further comprising:
    - the node in reconnecting state attempting to connect to one of the plurality of cluster nodes as its next node;
      
      upon said connecting, the node transitioning to a joining state and sending a connect request message to its next node; and
      
      the node waiting in the joining state to receive a connect complete message.
  - 14. The computer system as recited in claim 13, further comprising upon receiving the connect complete message from one of the plurality of cluster nodes connected to the node as its next node, the node transitioning to a joined state configured to operate as a member of the distributed data cluster, wherein in the joined state the node is a member of the distributed data cluster in the topology order between its previous node and its next node.
  - 15. The computer system as recited in claim 13, further comprising:
    - after sending the connect request message the node receiving a connect reject message from one of the plurality of cluster nodes, wherein the connect reject message includes data indicating a designated node;
      
      after receiving the connect reject message the node transitioning to reconnecting state, connecting to the designated node, and sending a connect request message to the designated node; and
      
      the node waiting in the joining state to receive a connect complete message.
  - 16. The computer system as recited in claim 12, further comprising the node sending a node ping to its previous node and failing to receive the ping message from its next node before said updating local topology data.
  - 17. The computer system as recited in claim 12, further comprising:
    - the node receiving a connect request message from a cluster node in the distributed data cluster, wherein the node'"'"'s previous node is the failed node and the cluster node is the previous node of the failed node;
      
      after receiving the connect request message, the node transitioning to a transient state and sending a node joined message to its next node including topology data indicating the cluster node as its previous node;
      
      the node waiting in the transient state to receive a connect complete message from the cluster node; and
      
      upon receiving the connect complete message from the cluster node, the node transitioning to the joined state wherein the node is connected to the cluster node as its previous node in the cluster topology order.
  - 18. The computer system as recited in claim 12, wherein the connect complete message includes data indicating each node in the plurality of cluster nodes.

19. A computer system comprising a processor and memory including instructions executable by the processor for:
- a node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order;
  
  the node updating local topology data after said receiving to reflect topology data included in the node dead message;
  
  if its previous node sent the node dead message, the node sending a node dead message to its next node; and
  
  if its next node sent the node dead message, the node sending a node dead message to its previous node.
- View Dependent Claims (20, 21)
- - 20. The method as recited in claim 19, wherein the node dead message include data indicating the one of the plurality of cluster nodes that sent the node dead message to the node.
  - 21. The method as recited in claim 19, wherein the node appends data identifying the node to the node dead message before said sending the node dead message.

22. A computer system comprising a processor and memory including instructions executable by the processor for:
- a node in a cluster of a plurality of nodes receiving a node dead message from one of the plurality of cluster nodes, wherein the plurality of nodes are connected together to form a distributed data cluster having a topology order, wherein the node dead message include topology data indicating a failed node;
  
  if the failed node is the node'"'"'s previous node, the node verifying that its previous node has failed and sending a connect reject message to one of the plurality of cluster nodes if its previous node is active; and
  
  if the failed node is the node'"'"'s next node, the node verifying that its next node has failed and sending a connect reject message to one of the plurality of cluster nodes if its next node is active.

23. A method, comprising:
- a first and a second node detecting a node failure in a plurality of cluster nodes connected together to form a distributed data cluster having a topology order, wherein the first node is the failed node'"'"'s previous node and the second node is the failed node'"'"'s next node;
  
  the first node and the second updating local topology data after said detecting to reflect the node failure;
  
  the first node sending a node dead message to its previous node and transitioning to reconnecting state to being reconnecting to the second node;
  
  the second node sending a node dead message to its previous node;
  
  the first node in reconnecting state connecting to the second node;
  
  after said connecting the first node transitioning to a joining state and sending a connect request message to the second node;
  
  the first node waiting in the joining state to receive a connect complete message;
  
  the second node receiving the connect request message from the first node;
  
  after receiving the connect request message, the second node transitioning to a transient state and sending a node joined message to its next node including data indicating that the first node as the second node'"'"'s previous node;
  
  the second node waiting in the transient state to receive a connect complete message from the first node;
  
  the first node'"'"'s previous node receiving the node joined message and sending the first node a connect complete message;
  
  upon receiving the connect complete message from its previous node, the first node sending a connect complete message to the second node and transitioning to a joined state as a member of the distributed data cluster; and
  
  upon receiving the connect complete message from the first node, the second node transitioning to the joined state wherein the second node is connected to the first node as its previous node in the cluster topology order;
  
  wherein in the joined state the first node is a member of the distributed data cluster in the topology order between its previous node and its next node.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle America, Inc. (Oracle Corporation)
Original Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Inventors
Kannan, Mahesh, Dinker, Darpan, Gopinath, Pramod

Granted Patent

US 7,139,925 B2
Time in Patent Office

Days
Field of Search
US Class Current

714/43
CPC Class Codes

H04L 41/12   Discovery or management of ...

H04L 45/02   Topology update or discovery

H04L 45/22   Alternate routing

H04L 45/28   using route fault recovery

H04L 45/46   Cluster building

H04L 67/1001   for accessing one among a p...

H04L 67/1034   Reaction to server failures...

System and method for dynamic cluster adjustment to node failures in a distributed data system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for dynamic cluster adjustment to node failures in a distributed data system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links