Scheduling of splits and moves of database partitions
First Claim
1. A system, comprising:
- one or more processors;
a memory coupled to the one or more processors; and
a plurality of storage nodes, each of which comprises one or more storage devices or logical storage volumes;
wherein the memory stores program instructions that when executed by the one or more processors cause the one or more processors to implement a distributed database service;
wherein the distributed database service maintains data in two or more partitions, and wherein to maintain the data in the two or more partitions, the distributed database service is configured to store two or more replicas of each of the two or more partitions on respective storage devices or logical storage volumes of respective ones of the plurality of storage nodes;
wherein multiple storage nodes of the plurality of storage nodes are configured to;
identify one or more candidate partition management operations to be performed on the storage node, wherein each of the candidate partition management operations comprises a split operation targeting a partition replica that is stored on a storage device or logical volume of the storage node or a move operation targeting one or more partition replicas that are stored on a storage device or logical volume of the storage node; and
send information about each of the one or more candidate partition management operations to a central partition management scheduler of the distributed database service, wherein for each of the candidate partition management operations, the information comprises an indication of whether the candidate partition management operation comprises a split operation or a move operation, an indication of a number of replicas to be affected by the candidate partition management operation, or an indication of an amount of a resource that is provisioned for replicas affected by the candidate partition management operation; and
wherein the central partition management scheduler of the distributed database service is configured to;
receive the information about the one or more candidate partition management operations from the multiple storage nodes; and
apply a global prioritization across the candidate partition management operations to be performed on the multiple storage nodes, wherein to apply the global prioritization the central partition management scheduler is configured to determine an order in which at least some of the candidate partition management operations across the multiple nodes are to be performed based, at least in part, on the received information.
1 Assignment
0 Petitions
Accused Products
Abstract
A system that implements a data storage service may store data in multiple replicated partitions on respective computing nodes on behalf of clients. A storage node may, based on the amount of provisioned resources on a given storage device or logical volume, identify candidate partition management operations to be performed, and may send information about the operations to a central partition management scheduler. The scheduler may apply a global prioritization scheme to determine an order in which to perform the candidate operations. The order may be based on whether the operations include partition splits or partition moves, whether they aim to reduce provisioned storage capacity or reduce throughput capacity on a storage device or logical volume, whether they conflict with each other, whether the total number of partitions (or replicas thereof) involved in partition management at any given time exceeds a pre-determined limit, or whether they were requested by clients.
-
Citations
20 Claims
-
1. A system, comprising:
-
one or more processors; a memory coupled to the one or more processors; and a plurality of storage nodes, each of which comprises one or more storage devices or logical storage volumes; wherein the memory stores program instructions that when executed by the one or more processors cause the one or more processors to implement a distributed database service; wherein the distributed database service maintains data in two or more partitions, and wherein to maintain the data in the two or more partitions, the distributed database service is configured to store two or more replicas of each of the two or more partitions on respective storage devices or logical storage volumes of respective ones of the plurality of storage nodes; wherein multiple storage nodes of the plurality of storage nodes are configured to; identify one or more candidate partition management operations to be performed on the storage node, wherein each of the candidate partition management operations comprises a split operation targeting a partition replica that is stored on a storage device or logical volume of the storage node or a move operation targeting one or more partition replicas that are stored on a storage device or logical volume of the storage node; and send information about each of the one or more candidate partition management operations to a central partition management scheduler of the distributed database service, wherein for each of the candidate partition management operations, the information comprises an indication of whether the candidate partition management operation comprises a split operation or a move operation, an indication of a number of replicas to be affected by the candidate partition management operation, or an indication of an amount of a resource that is provisioned for replicas affected by the candidate partition management operation; and wherein the central partition management scheduler of the distributed database service is configured to; receive the information about the one or more candidate partition management operations from the multiple storage nodes; and apply a global prioritization across the candidate partition management operations to be performed on the multiple storage nodes, wherein to apply the global prioritization the central partition management scheduler is configured to determine an order in which at least some of the candidate partition management operations across the multiple nodes are to be performed based, at least in part, on the received information. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, comprising:
performing by one or more computers; obtaining, from a plurality of storage nodes of a distributed data storage system, information about each of a plurality of candidate partition management operations to be performed on the plurality of storage nodes of the distributed data storage system, wherein the distributed data storage system stores data in two or more partitions on respective storage devices or logical storage volumes of the plurality of storage nodes, and wherein for each of the candidate partition management operations, the information comprises an indication of a partition management operation type, wherein the partition management operation type specifies whether the candidate partition management operation comprises a split type operation or a move type operation; applying a global prioritization across the plurality of candidate partition management operations to be performed on the plurality of storage nodes, wherein said applying the global prioritization comprises determining an order in which to perform the plurality of candidate partition management operations across the plurality of storage nodes based, at least in part, on the partition management operation types of the plurality of candidate partition management operations. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
15. A non-transitory, computer-readable storage medium storing program instructions that when executed on one or more computers cause the one or more computers to perform:
-
receiving, from a plurality of storage nodes, information about each of a plurality of candidate partition management operations to be performed on the plurality of storage nodes on which database partitions are stored on one or more storage devices or logical storage volumes based, at least in part, on the received information; and applying a global prioritization across the plurality of candidate partition management operations to be performed on the plurality of storage nodes, wherein said applying the global prioritization comprises determining an order in which to perform at least some of the candidate partition management operations across the plurality of storage nodes based, at least on part, on a number of partitions affected by the at least some of the candidate partition management operations or on an amount of provisioned resources affected by at least some of the candidate partition management operations. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification