Cluster extension in distributed systems using tree method

US 8,103,772 B2
Filed: 12/24/2003
Issued: 01/24/2012
Est. Priority Date: 12/24/2003
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for managing data storage servers, the method steps performed by one or more processors and comprising:

establishing, by the one or more processors, a cluster of back-end servers organized into at least one leaf node, every back-end server in any leaf node mirroring every other back-end server in the same leaf-node;

establishing, by the one or more processors, a predefined set of ultimate identifiers for nodes that could be created in the cluster, each leaf node in the cluster having a node identifier that is unique in the cluster and one of the predefined set of ultimate identifiers;

storing every new data object on the cluster on particular back-end servers assigned to a particular leaf node of the at least one leaf node, the particular leaf node having a particular node identifier that identifies a subset of the set of ultimate identifiers, and providing for each new data object a universal identifier that combines(i) an object identifier that is unique on the particular leaf node; and

(ii) a server identifier, where the server identifier includes(a) a leaf node ID part having a value corresponding to the particular identifier of the particular leaf node to which the particular back-end server is assigned, and(b) an unused part having an at least partially arbitrary value, andWhere the node identifiers for nodes existing in the cluster at any one time identify non-overlapping subsets of the set of ultimate identifiers;

splitting, by the one or more processors, a first leaf node of the at least one leaf node into at least two new leaf nodes that replace the first leaf node; and

assigning, by the one or more processors, to each new leaf node, at least one of the back-end servers of the first leaf node such that replacing the first leaf node reassigns the back-end servers, originally organized in the first leaf node, to the new leaf nodes, where assigning back-end servers to one of the new leaf nodes includes;

appending a portion of the unused part of the server identifier, of each of the back-end servers of the first leaf node, to the leaf node ID part of the server identifier to create an updated leaf node ID part, wherein each updated leaf node ID part corresponds to a node identifier of one of the new leaf nodes; and

assigning each back-end server of the first leaf node to the new leaf node having a node identifier corresponding to the updated leaf node ID part of the back-end server.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus, including computer program products, for managing a cluster of servers organized into nodes. A method of one aspect includes establishing a cluster; establishing a set of ultimate identifiers for nodes resulting from splitting in the cluster; and storing every new data object on a node that has a node identifier that identifies a subset of the set of ultimate identifiers, and providing for the object a universal identifier that combines (i) an object identifier that is unique on the node and (ii) a server identifier that is one of the ultimate identifiers in the subset. A method of another aspect includes generating for a new data object a universal identifier that has a node identifier part that uniquely identifies a node, a reserve part generated at least in part as a pseudo-random value, and an object identifier part that uniquely identifies the object in the node.

Citations

35 Claims

1. A computer-implemented method for managing data storage servers, the method steps performed by one or more processors and comprising:
- establishing, by the one or more processors, a cluster of back-end servers organized into at least one leaf node, every back-end server in any leaf node mirroring every other back-end server in the same leaf-node;
  
  establishing, by the one or more processors, a predefined set of ultimate identifiers for nodes that could be created in the cluster, each leaf node in the cluster having a node identifier that is unique in the cluster and one of the predefined set of ultimate identifiers;
  
  storing every new data object on the cluster on particular back-end servers assigned to a particular leaf node of the at least one leaf node, the particular leaf node having a particular node identifier that identifies a subset of the set of ultimate identifiers, and providing for each new data object a universal identifier that combines(i) an object identifier that is unique on the particular leaf node; and
  
  (ii) a server identifier, where the server identifier includes(a) a leaf node ID part having a value corresponding to the particular identifier of the particular leaf node to which the particular back-end server is assigned, and(b) an unused part having an at least partially arbitrary value, andWhere the node identifiers for nodes existing in the cluster at any one time identify non-overlapping subsets of the set of ultimate identifiers;
  
  splitting, by the one or more processors, a first leaf node of the at least one leaf node into at least two new leaf nodes that replace the first leaf node; and
  
  assigning, by the one or more processors, to each new leaf node, at least one of the back-end servers of the first leaf node such that replacing the first leaf node reassigns the back-end servers, originally organized in the first leaf node, to the new leaf nodes, where assigning back-end servers to one of the new leaf nodes includes;
  
  appending a portion of the unused part of the server identifier, of each of the back-end servers of the first leaf node, to the leaf node ID part of the server identifier to create an updated leaf node ID part, wherein each updated leaf node ID part corresponds to a node identifier of one of the new leaf nodes; and
  
  assigning each back-end server of the first leaf node to the new leaf node having a node identifier corresponding to the updated leaf node ID part of the back-end server.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein:
    - the set of ultimate identifiers is a set of binary numbers of a fixed length; and
      
      each leaf node in the cluster has a node identifier that is a binary number of the fixed length or of a length less than the fixed length.
  - 3. The method of claim 1, wherein:
    - each node identifier identifies a subset of the set of ultimate identifiers by being a beginning part or an ending part of each identifier in the subset or by being the only identifier in the subset.
  - 4. The method of claim 1, wherein:
    - the unused part of the server identifier comprises a part generated as a pseudo random value.
  - 5. The method of claim 4, wherein:
    - the pseudo random value is generated by a back-end server.
  - 6. The method of claim 4, wherein:
    - the pseudo random value is generated by a front-end server.
  - 7. The method of claim 1, wherethe first leaf node had a first node identifier that identified a first subset of the set of ultimate identifiers,the new leaf nodes each have a distinct node identifier that identifies a new distinct subset of the first subset, andthe union of the new distinct subsets is the first subset;
    - andthe method further comprising removing from each back-end server of each new leaf node any data objects that have a server identifier that does not correspond to the node identifier for the new leaf node to which the back-end server is assigned.
  - 8. The method of claim 7, wherein:
    - the set of ultimate identifiers is a set of binary numbers of a fixed length;
      
      the first leaf node has a first node identifier that is a binary number of a length less than the fixed length; and
      
      the node identifier of each new leaf node includes within it the binary number of the first node identifier.
  - 9. The method of claim 7, further comprising:
    - using a load measured for each leaf node in the cluster in selecting the first leaf node as a leaf node to be split.
  - 10. The method of claim 9, wherein the measured load comprises aggregate data capacity of servers organized in each leaf node in the cluster.
  - 11. The method of claim 7, further comprising:
    - determining whether the first leaf node has fewer than four back-end servers and, if it does, adding back-end servers to the first leaf node so that the first leaf node has at least four back-end servers, and populating the added back-end servers with all of the data objects stored on the first leaf node, before splitting the first leaf node.
  - 12. The method of claim 1, wherein each server identifier has a fixed value.
  - 13. The method of claim 12, wherein the server identifier has a value comprising a concatenation of the leaf node ID part and unused part.

14. A computer program product, tangibly embodied on a non-transitory computer storage device, for managing data storage servers, the product being operable to cause data processing apparatus to perform operations comprising:
- establishing a cluster of back-end servers organized into at least one leaf node, every back-end server in any leaf node mirroring every other back-end server in the same leaf node;
  
  establishing a predefined set of ultimate identifiers for nodes that could be created in the cluster, each leaf node in the cluster having a node identifier that is unique in the cluster and one of the predefined set of ultimate identifiers;
  
  storing every new data object on the cluster on particular back-end servers assigned to a particular leaf node of the at least one leaf node, the particular leaf node having a particular node identifier, and providing for each new data object a universal identifier that combines(i) an object identifier that is unique on the particular node; and
  
  (ii) a server identifier, where the server identifier includes;
  
  a) a leaf node ID part having a value corresponding to the particular leaf node identifier of the particular leaf node to which the particular back-end server is assigned, andb) an unused part having at least partially arbitrary value, andwhere the node identifiers for nodes existing in the cluster at any one time identify non-overlapping subsets of the set of ultimate identifiers;
  
  splitting a first leaf node of the at least one leaf node into at least two new leaf nodes that replace first leaf node;
  
  assigning to each new leaf node at least one of the back-end servers of the first leaf node such that replacing the first leaf node reassigns the back-end servers, originally organized in the first leaf node, to the new leaf nodes, where assigning back-end servers to one of the new leaf nodes includes appending a portion of the unused part of the server identifier, of each of the back-end servers of the first leaf node, to the leaf node ID part of the server identifier to create and updated leaf node ID part, wherein each updated leaf node ID part corresponds to a node identifier of one of the new leaf nodes; and
  
  removing from each back-end server of each new leaf node any data objects that have a server identifier with an updated leaf node ID part that does not correspond to the node identifier for the new leaf node to which the back-end server is assigned.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 15. The product of claim 14, wherein:
    - the set of ultimate identifiers is a set of binary numbers of a fixed length; and
      
      each leaf node in the cluster has a node identifier that is a binary number of the fixed length or of a length less than the fixed length.
  - 16. The product of claim 15, wherein:
    - the set of ultimate identifiers is the set of 32 bit binary numbers; and
      
      the object identifier is a 32 bit binary number.
  - 17. The product of claim 14, wherein:
    - each node identifier identifies a subset of the set of ultimate identifiers by being a beginning part or an ending part of each identifier in the subset or by being the only identifier in the subset.
  - 18. The product of claim 14, wherein:
    - the unused part of the server identifier comprises a part generated as a pseudo random value.
  - 19. The product of claim 18, wherein:
    - the pseudo random value is generated by a back-end server.
  - 20. The product of claim 18, wherein:
    - the pseudo random value is generated by a front-end server.
  - 21. The product of claim 14, wherethe first leaf node had a first node identifier that identified a first subset of the set of ultimate identifiers,the new leaf nodes each have a distinct node identifier that identifies a new distinct subset of the first subset, andthe union of the new distinct subsets is the first subset.
  - 22. The product of claim 21, wherein:
    - the set of ultimate identifiers is a set of binary numbers of a fixed length;
      
      the first leaf node has a first node identifier that is a binary number of a length less than the fixed length; and
      
      the node identifier of each new leaf node includes within it the binary number of the first node identifier.
  - 23. The product of claim 21, further operable to cause data processing apparatus to perform operations comprising:
    - using a load measured for each leaf node in the cluster in selecting the first leaf node as a leaf node to be split.
  - 24. The product of claim 21, further operable to cause data processing apparatus to perform operations comprising:
    - determining whether the first leaf node has fewer than four back-end servers and, if it does, adding back-end servers to the first leaf node so that the first leaf node has at least four back-end servers, and populating the added back-end servers with all of the data objects stored on the first leaf node, before splitting the first leaf node.

25. A system for managing data storage, comprising:
- a memory storing a cluster of back-end servers organized into at least one leaf node, every back-end server in any leaf node mirroring every other back-end server in the same leaf node; and
  
  at least one processor that, when executed;
  
  establish a predefined set of ultimate identifiers for nodes that could be created in the cluster, each leaf node in the cluster having a node identifier that is unique in the cluster and one of the predefined set of ultimate identifiers;
  
  where the set of ultimate identifiers is a set of binary numbers of a fixed length;
  
  cause every new data object on the cluster to be stored on a particular leaf node of the at least one leaf node, the particular leaf node having a particular node identifier that identifies a subset of the set of ultimate identifiers, and providing for the object a universal identifier that combines (i) an object identifier that is unique on the particular leaf node; and
  
  (ii) a server identifier, where the server identifier has a fixed value, is one of the ultimate identifiers in the subset, and includes an identification of the particular node identifier of the particular leaf node to which the particular back-end server is assigned, and where the node identifiers for nodes existing in the cluster at any one time identify non-overlapping subsets of the set of ultimate identifiers;
  
  split a first leaf node of the at least one leaf node into at least two new leaf nodes that replace the first leaf node; and
  
  assign to each new leaf node, at least one of the back-end servers of the first leaf node such that replacing the first leaf node reassigns the back-end servers, originally organized in the first leaf node, to the new leaf nodes, where the server identifiers of the reassigned back-end servers each include an identification of the node identifier of the new leaf node to which the reassigned back-end server has been reassigned.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 26. The system of claim 25, wherein:
    - the set of ultimate identifiers is a set of binary numbers of a fixed length; and
      
      each leaf node in the cluster has a node identifier that is a binary number of the fixed length or of a length less than the fixed length.
  - 27. The system of claim 26, wherein:
    - the set of ultimate identifiers is the set of 32 bit binary numbers; and
      
      the object identifier is a 32 bit binary number.
  - 28. The system of claim 25, wherein:
    - the unused part of the server identifier comprises a part generated as a pseudo random value.
  - 29. The system of claim 28, wherein:
    - the pseudo random value is generated by a back-end server.
  - 30. The system of claim 28, wherein:
    - the pseudo random value is generated by a front-end server.
  - 31. The system of claim 25, wherethe first leaf node had a first node identifier that identified a first subset of the set of ultimate identifiers,the new leaf nodes each have a distinct node identifier that identifies a new distinct subset of the first subset, andthe union of the new distinct subsets is the first subset;
    - andthe at least one processor, when executed, removes from each back-end server of each new leaf node any data objects that have a server identifier that does not correspond to the node identifier for the new leaf node to which the back-end server is assigned.
  - 32. The system of claim 31, wherein the at least one processor uses a load measured for each leaf node in the cluster in selecting the first leaf node as a leaf node to be split.
  - 33. The system of claim 25, wherein:
    - the node identifier of each new leaf node includes within it the binary number of the first node identifier.
  - 34. The system of claim 31,wherein the at least one processor, when executed, determines whether the first leaf node has fewer than four back-end servers and, if it does, adds back-end servers to the first leaf node so that the first leaf node has at least four back-end servers, and populates the added back-end servers with all of the data objects stored on the first leaf node, before splitting the first leaf node.

35. A computer-implemented method for managing data storage servers, the method steps performed by one or more processors and comprising:
- establishing, by the one or more processors, a cluster of back-end servers organized into at least one leaf node, every leaf node in the cluster having a node identifier that is unique in the cluster, every back-end server in any leaf node mirroring every other back-end server in the same leaf-node;
  
  establishing, by the one or more processors, a predefined set of ultimate identifiers for nodes that could be created in the cluster,storing every new data object on the cluster on a particular leaf node of the at least one leaf node, the particular leaf node having a particular node identifier that identifies a subset of the set of ultimate identifiers, and providing for the object an universal identifier that combines(i) an object identifier that is unique on the particular leaf node; and
  
  (ii) a server identifier, where the server identifier is one of the ultimate identifiers in the subset, and where the node identifiers for nodes existing in the cluster at any one time identify non-overlapping subsets of the set of ultimate identifiers;
  
  determining, by the one or more processors, whether the first leaf node has fewer than four back-end server and, if it does, adding back-end servers to the first leaf node so that the first leaf node has at least four back-end servers, and populating the added back-end servers with all of the data objects stored on the first leaf node, before splitting the first leaf node;
  
  splitting, by the one or more processors, a first leaf node of the at least one leaf node into at least two new leaf nodes that replace the first leaf node, wherein;
  
  the first leaf node had a first node identifier that identified a first subset of the set of ultimate identifiers,the new leaf nodes each have distinct node identifier that identifies a new distinct subset of the first subset, andthe union of the new distinct subsets in the first subset; and
  
  splitting, by the one or more processors, the first leaf node comprising assigning back-end servers, originally organized in the first leaf node, to the new leaf nodes by replacing the first leaf node such that each new leaf node is assigned at least two of the back-end server originally organized in the first leaf node, wherein a first of the back-end servers assigned to each new leaf node mirrors other back-end servers assigned to the particular new leaf node but does not mirror the back-end servers in the other new leaf nodes replacing the first leaf node; and
  
  removing, by the one or more processors, from each back-end server of each new leaf node any data objects that have a server identifier that does not correspond to the node identifier for the new leaf node to which the back-end server is assigned, node ID part of the back-end server.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SAP SE
Original Assignee
SAP AG (SAP SE)
Inventors
Schreter, Ivan
Primary Examiner(s)
Divecha, Kamal B.
Assistant Examiner(s)
Patel, Dhairya A

Application Number

US10/746,977
Publication Number

US 20050160170A1
Time in Patent Office

2,953 Days
Field of Search

709/223, 709/224, 709/225, 709/226, 709/201, 709/202, 709/203, 709/249
US Class Current

709/226
CPC Class Codes

H04L 2101/604   Address structures or formats

H04L 61/5069   for group communication, mu...

H04L 67/1095   Replication or mirroring of...

Cluster extension in distributed systems using tree method

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

Cluster extension in distributed systems using tree method

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links