Ownership reassignment in a shared-nothing database system
First Claim
1. A method for managing data, comprising:
- maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage;
assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item;
when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item while said particular data item continues to reside at said particular location;
reassigning ownership of the particular data item from the particular node to another node without moving the particular data item from said particular location on said persistent storage;
after the reassignment, when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item while said particular data item continues to reside at said particular location,wherein;
said persistent storage is a first persistent storage of a plurality of persistent storages used by a multi-node database system,the method further comprises reassigning ownership of a second data item from a first node that has access to said first persistent storage to a second node that has access to a second persistent storage but does not have access to said first persistent storage, andthe method further comprises reassigning ownership of the second data item by moving the second data item from said first persistent storage to said second persistent storage.
1 Assignment
0 Petitions
Accused Products
Abstract
Various techniques are described for improving the performance of a shared-nothing database system in which at least two of the nodes that are running the shared-nothing database system have shared access to a disk. Specifically, techniques are provided for changing the ownership of data in a shared-nothing database without changing the location of the data on persistent storage. Because the persistent storage location for the data is not changed during a transfer of ownership of the data, ownership can be transferred more freely and with less of a performance penalty than would otherwise be incurred by a physical relocation of the data. Various techniques are also described for providing fast run-time reassignment of ownership. Because the reassignment can be performed during run-time, the shared-nothing system does not have to be taken offline to perform the reassignment. Further, the techniques describe how the reassignment can be performed with relatively fine granularity, avoiding the need to perform bulk reassignment of large amounts of data across all nodes merely to reassign ownership of a few data items on one of the nodes.
54 Citations
66 Claims
-
1. A method for managing data, comprising:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage; assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item; when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item while said particular data item continues to reside at said particular location; reassigning ownership of the particular data item from the particular node to another node without moving the particular data item from said particular location on said persistent storage; after the reassignment, when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item while said particular data item continues to reside at said particular location, wherein; said persistent storage is a first persistent storage of a plurality of persistent storages used by a multi-node database system, the method further comprises reassigning ownership of a second data item from a first node that has access to said first persistent storage to a second node that has access to a second persistent storage but does not have access to said first persistent storage, and the method further comprises reassigning ownership of the second data item by moving the second data item from said first persistent storage to said second persistent storage. - View Dependent Claims (2, 14, 22, 23, 24, 32, 33, 45, 53, 54, 55)
-
-
3. A method for managing data, comprising:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage; assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item; when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item; while the particular node continues to operate, reassigning ownership of the particular data item from the particular node to another node; after the reassignment, when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item, wherein; said persistent storage is a first persistent storage of a plurality of persistent storages used by said multi-node database system; and the method further comprises reassigning ownership of a second data item from a first node that has access to said first persistent storage to a second node that has access to a second persistent storage but does not have access to said first persistent storage; and wherein the step of reassigning ownership of the second data item includes moving the second data item from said first persistent storage to said second persistent storage. - View Dependent Claims (4, 5, 6, 7, 8, 9, 34, 35, 36, 37, 38, 39, 40)
-
-
10. A method for managing data, comprising:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage, wherein the plurality of nodes are nodes of a multi-node database system; assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item; when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item; while the particular node continues to operate, reassigning ownership of the particular data item from the particular node to another node; wherein the step of reassigning ownership of the particular data item from the particular node to said other node is performed as part of a gradual transfer of ownership from said particular node to one or more other nodes in said multi-node database system, wherein the gradual transfer is initiated in response to detecting that said particular node is overworked relative to the one or more other nodes, wherein the gradual transfer is terminated in response to detecting that said particular node is now longer overworked relative to the one or more other nodes; and after the reassignment, when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item. - View Dependent Claims (41)
-
-
11. A method for managing data, comprising:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage, wherein the plurality of nodes are nodes of a multi-node database system; assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item; when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item; while the particular node continues to operate, reassigning ownership of the particular data item from the particular node to another node; wherein the step of reassigning ownership of the particular data item from the particular node to another node is performed as part of a gradual transfer of ownership to said other node by one or more other nodes, wherein said gradual transfer is initiated in response to detecting that said other node is underworked relative to the one or more other nodes in said multi-node database system; after the reassignment, when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item. - View Dependent Claims (42)
-
-
12. A method for managing data, comprising:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage; assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item; when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data; while the particular node continues to operate, reassigning ownership of the particular data item from the particular node to another node; after the reassignment, when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item; and after a first node has been removed from the multi-node system, continuing to have a set of data items owned by the first node. - View Dependent Claims (13, 43, 44, 65, 66)
-
-
15. A method for managing data, comprising:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage; assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item; when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item; while the particular node continues to operate, reassigning ownership of the particular data item from the particular node to another node, wherein; at the time said particular data item is to be reassigned to said other node, the particular node stores a dirty version of said particular data item in volatile memory; and the step of reassigning ownership of the particular data item from the particular node to another node includes forcing to persistent storage one or more redo records associated with said dirty version, and purging said dirty version from said volatile memory without writing said dirty version of said particular data item to said persistent storage; and said other node reconstructs said dirty version by applying said one or more redo records to the version of the particular data item that resides on said persistent storage. - View Dependent Claims (46)
-
-
16. A method for managing data, comprising:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage; assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item; when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item; while the particular node continues to operate, reassigning ownership of the particular data item from the particular node to another node, wherein; at the time said particular data item is to be reassigned to said other node, the particular node stores a dirty version of said particular data item in volatile memory; and the method further includes the step of transferring the dirty version of said particular data item from volatile memory associated with said particular node to volatile memory associated with said other node. - View Dependent Claims (17, 18, 47, 48, 49)
-
-
19. A method for managing data, comprising:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage; assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item; when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item; reassigning ownership of the particular data item from the particular node to another node, wherein; the step of reassigning ownership of the particular data item from the particular node to another node is performed without waiting for a transaction that is modifying the data item to commit; the transaction makes a first set of modifications while the particular data item is owned by the particular node; and the transaction makes a second set of modifications while the particular data item is owned by said other node. - View Dependent Claims (20, 50, 51)
-
-
21. A method for managing data, comprising:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes, the persistent data items including a particular data item stored at a particular location on said persistent storage; assigning exclusive ownership of each of the persistent data items to one of the nodes, wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item; when any node wants an operation performed that involves said particular data item, the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item; while the particular node continues to operate, reassigning ownership of the particular data item from the particular node to another node; the other node receiving a request to update said data item; determining whether the particular node held exclusive-mode or shared-mode access to the data item; and if the particular node did not hold exclusive-mode or shared-mode access to the data item, then the other node updating the particular data item without waiting for the particular node to flush any dirty version of the data item, or redo for the dirty version, to persistent storage. - View Dependent Claims (52)
-
-
25. A method of managing data, the method comprising the steps of:
-
maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes; assigning ownership of each of the persistent data items to one of the nodes by assigning each data item to one of a plurality of buckets by enumerating individual data-item-to-bucket relationships; and assigning each bucket to one of the plurality of nodes by enumerating individual bucket-to-node relationships; wherein the node to which a bucket is assigned is established to be owner of all data items assigned to the bucket; when a first node wants an operation performed that involves a data item owned by a second node, the first node ships the operation to the second node for the second node to perform the operation. - View Dependent Claims (26, 27, 28, 29, 30, 31, 56, 57, 58, 59, 60, 61, 62)
-
-
63. A method for use in a multi-node shared-nothing database system, the method comprising the steps of:
-
a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item, wherein said first data item and said second data item are persistently stored data items within a database managed by the multi-node shared-nothing database system; without changing the location of a first data item on persistent storage or shutting down said first node, reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system; and after reassigning ownership, the first node continuing to operate as the owner of the second data item, and to handle all requests for operations on said second data item. - View Dependent Claims (64)
-
Specification