System and method for performing a cluster topology self-healing process in a distributed data system cluster

US 20040066741A1
Filed: 09/23/2002
Published: 04/08/2004
Est. Priority Date: 09/23/2002
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

detecting a failed node within a cluster comprising a first node, the failed node, and a second node, wherein the failed node stores a data set;

in response to said detecting, performing a cluster topology self-healing process to copy the data set from the first node to the second node, wherein the data set is dividable into a plurality of domains, and wherein the cluster topology self-healing process includes;

locking one of the plurality of domains on the second node, wherein said locking does not lock any other one of the plurality of domains on the second node;

storing data included in the one of the plurality of domains sent from the first node to the second node;

releasing the one of the plurality of domains; and

repeating said locking, said copying, and said releasing for each other one of the plurality of domains.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A cluster topology self-healing process is performed in order to replicate a data set stored on a failed node from a first node storing another copy of the data set to a second non-failed node. The self-healing process is performed by: locking one of several domains included in the data set, where locking that domain does not lock any of the other domains in the data set; storing data sent from the first node to the second node in the domain; and releasing the domain. This process of locking, storing, and releasing is repeated for each other domain in the data set. Each domain may be locked for significantly less time than it takes to copy the entire data set. Accordingly, client access requests targeting a locked domain will be delayed for less time than if the entire data set is locked during the self-healing process.

197 Citations

25 Claims

1. A method, comprising:
- detecting a failed node within a cluster comprising a first node, the failed node, and a second node, wherein the failed node stores a data set;
  
  in response to said detecting, performing a cluster topology self-healing process to copy the data set from the first node to the second node, wherein the data set is dividable into a plurality of domains, and wherein the cluster topology self-healing process includes;
  
  locking one of the plurality of domains on the second node, wherein said locking does not lock any other one of the plurality of domains on the second node;
  
  storing data included in the one of the plurality of domains sent from the first node to the second node;
  
  releasing the one of the plurality of domains; and
  
  repeating said locking, said copying, and said releasing for each other one of the plurality of domains.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising:
    - the second node receiving a client write access request targeting a first one of the plurality of domains during said performing the cluster topology self-healing process; and
      
      if the first one of the plurality of domains is not locked, the second node responsively locking the one of the plurality of domains and modifying the first one of the plurality of domains.
  - 3. The method of claim 1, further comprising:
    - creating a plurality of datum entries in the one of the plurality of domains subsequent to said locking and prior to said releasing; and
      
      performing said storing subsequent to said releasing, wherein said storing comprises;
      
      locking one of the plurality of datum entries;
      
      storing a portion of the data in the one of the plurality of datum entries;
      
      releasing the one of the plurality of datum entries; and
      
      repeating said locking, said storing, and said releasing for each other one of the plurality of datum entries.
  - 4. The method of claim 3, further comprising:
    - the second node receiving a client write access request targeting the one of the plurality of domains during said performing the cluster topology self-healing process, wherein the client write access request targets data included in a first one of the plurality of datum entries; and
      
      if the first one of the plurality of datum entries is not locked, the second node responsively locking the one of the plurality of datum entries and modifying the data included in the first one of the plurality of datum entries.
  - 5. The method of claim 4, further comprising the second node performing said modifying in response to determining that a copy of the data included with the client write access request is more current than a copy of the data stored as part of performance of the cluster topology self-healing process.
  - 6. The method of claim 4, further comprising the second node not storing a copy of the data included in the first one of the plurality of datum entries received as part of performance of the cluster topology self-healing process subsequent to said modifying if a copy of the data included with the client write access request is more current than a copy of the data received as part of performance of the cluster topology self-healing process.
  - 7. The method of claim 6, further comprising the second node determining that the copy of the data included with the client write access request is more current than the copy of the data received as part of performance of the cluster topology self-healing process by comparing timestamps included with each copy of the data.
  - 8. The method of claim 4, further comprising the second node creating the first one of the plurality of datum entries in response to receiving the client write access request if the first one of the plurality of datum entries has not yet been created as part of the cluster-topology self-healing process.
  - 9. The method of claim 1, wherein said locking further comprises locking the domain on the first node.

10. A distributed data cluster comprising:
- a plurality of nodes;
  
  an interconnect coupling the plurality of nodes;
  
  wherein a first node included in the plurality of nodes is configured to detect a failure of a second node included in the plurality of nodes, wherein the first node stores a copy of a data set stored by the second node;
  
  wherein in response to detecting the failure of the second node, the first node is configured to perform a copy operation for the data set with a third node included in the plurality of nodes over the interconnect;
  
  wherein the third node is configured to perform the copy operation by repeatedly;
  
  locking a subset of the data set on the third node, storing a copy of the locked subset received from the first node, and releasing the locked subset for each subset of a plurality of subsets included in the data set;
  
  wherein the third node is configured to lock the subset of the data without locking any other one of the plurality of subsets.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The distributed data system cluster of claim 10, wherein the plurality of subsets are organized according to a hierarchy.
  - 12. The distributed data system cluster of claim 10, wherein each of the plurality of subsets includes a constant number of data blocks.
  - 13. The distributed data system cluster of claim 10, wherein the third node is configured to respond to a client write access request targeting a first subset of the data set if the first subset is not locked by:
    - locking on the first subset, modifying the first subset responsive to the client write access request, and releasing the first subset;
      
      wherein the client write access request is received during performance of the copy operation.
  - 14. The distributed data system cluster of claim 13, wherein the third node is configured to respond to the client write access request in response to determining that a copy of the first subset of the data set included with the client write access request is more current than a copy of the first subset of the data set stored as part of performance of the copy operation.
  - 15. The distributed data system cluster of claim 13, wherein the third node is configured to not store a copy of the first subset of the data set received as part of performance of the copy operation subsequent to receiving the client write access request if the copy of the first subset of the data set included with the client write access request is more current than the copy of the first subset of the data set received as part of performance of the copy operation.
  - 16. The distributed data system cluster of claim 15, wherein the third node is configured to determine that the copy of the first subset of the data set included with the client write access request is more current than the copy of the first subset of the data set received as part of performance of copy operation by comparing timestamps included with each copy of the first subset of the data set.

17. A distributed data system, comprising:
- a distributed data system cluster comprising a plurality of nodes, wherein the a distributed data system cluster stores a plurality of data sets, and wherein each data set is replicated on at least two of the plurality of nodes; and
  
  a client node coupled to the plurality of nodes by a network, wherein the client node is configured to send a write access request targeting a first data set of the plurality of data sets to the cluster via the network;
  
  wherein in response to detecting a failure of a first node of the plurality of nodes storing the first data set of the plurality of data sets, the distributed data system cluster is configured to perform a cluster topology self-healing process to copy the first data set from a second node to a third node;
  
  wherein the distributed data system cluster is configured to perform the cluster topology self-healing process by;
  
  locking a subset of a plurality of subsets included in the first data set on the third node, wherein the distributed data system cluster is configured to perform said locking without locking any other subset of the plurality of subsets;
  
  copying data included in the subset of the first one of the data sets from the second node to the third node;
  
  releasing the subset of the first one of the data sets; and
  
  repeating said locking, said copying, and said releasing for each other subset included in the first data set;
  
  wherein if the distributed data system cluster receives the client write access request during performance of the cluster topology self-healing process, the distributed data system cluster is configured to respond to the client write access request by modifying a first subset of the first data set targeted by the client write access request if the first subset of the first data set is not locked for performance of the cluster topology self-healing process.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The distributed data system of claim 17, wherein the plurality of subsets are organized according to a hierarchy.
  - 19. The distributed data system of claim 17, wherein each of the plurality of subsets includes a constant number of data blocks.
  - 20. The distributed data system of claim 17, wherein if the first subset is not locked for performance of the cluster topology self-healing process, the third node is configured to respond to the client write access request targeting the first subset of the first data set by:
    - locking on the first subset, modifying the first subset responsive to the client write access request, and releasing the first subset.
  - 21. The distributed data system of claim 20, wherein the third node is configured to respond to the client write access request in response to determining that a copy of the first subset included with the client write access request is more current than a copy of the first subset stored as part of performance of the copy operation.
  - 22. The distributed data system of claim 20, wherein the third node is configured to not store a copy of the first subset received as part of performance of the copy operation subsequent to receiving the client write access request if the copy of the first subset included with the client write access request is more current than the copy of the first subset received as part of performance of the copy operation.
  - 23. The distributed data system of claim 22, wherein the third node is configured to determine that the copy of the first subset included with the client write access request is more current than the copy of the first subset received as part of performance of copy operation by comparing timestamps included with each copy of the first subset.

24. A device for use in a distributed data system cluster, the device comprising:
- a communication interface configured to send and receive communications from one or more other nodes, wherein the communication interface is configured to detect a failed node within the distributed data system cluster;
  
  a data store coupled to the communication interface and configured to store data; and
  
  a replication topology manager coupled to the communication interface and configured to participate in a copy operation involving a data set with another node in response to the communication interface detecting the failed node, wherein a copy of the data set is stored on the failed node;
  
  wherein the replication topology manager is configured to participate in the copy operation by;
  
  locking a first subset of the data set in the data store, wherein the replication topology manager is configured to acquire a lock on the first subset of the data set without acquiring a lock on any other subset of the data set;
  
  performing a copy operation for the first subset of the data set;
  
  releasing the first subset of the data set; and
  
  repeating said locking, said performing, and said releasing for each other subset of the data set.

25. A system, comprising:
- means for detecting a failed node within a cluster comprising a first node, the failed node, and a second node, wherein the failed node stores a data set;
  
  means for performing a cluster topology self-healing process to copy the data set from the first node to the second node in response to detecting the failed node, wherein the data set is dividable into a plurality of domains, and wherein the cluster topology self-healing process includes;
  
  locking one of the plurality of domains, wherein said locking does not lock any other one of the plurality of domains; and
  
  storing data included in the one of the plurality of domains sent from the first node to the second node; and
  
  releasing the one of the plurality of domains; and
  
  repeating said locking, said copying, and said releasing for each other one of the plurality of domains.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle America, Inc. (Oracle Corporation)
Original Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Inventors
Kannan, Mahesh, Dinker, Darpan, Gopinath, Pramod, Nadipalli, Suveen R.

Granted Patent

US 7,239,605 B2
Time in Patent Office

Days
Field of Search
US Class Current

370/216
CPC Class Codes

G06F 11/1662 the resynchronized componen...

H04L 41/0663 Performing the actions pred...

System and method for performing a cluster topology self-healing process in a distributed data system cluster

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

197 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

System and method for performing a cluster topology self-healing process in a distributed data system cluster

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

197 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others