Compaction policy
First Claim
1. A server-implemented method of selecting rowsets to compact in a tablet storing data associated with a distributed file system (DFS), the method comprising:
- restricting, by a server, each rowset in a plurality of rowsets included in the tablet to have a size less than a predetermined size threshold, wherein the each rowset stores keys covering a range that is less than or equal to a keyspace of the tablet;
determining a height of the tablet across the keyspace, wherein the height of the tablet is based on a number of rowsets having key ranges that overlap;
determining a rowset width of the each rowset in the keyspace of the tablet, wherein the rowset width of the each rowset is proportional to a percentage of the keyspace that is covered by the rowset;
until a minimum operational cost is reached, iteratively calculating, by the server, an operational cost associated with compaction of two or more rowsets in the keyspace, wherein the cost is calculated by integrating the rowset width of the each rowset included in the tablet across the height of the tablet, wherein the operational cost depends on a size of the each rowset; and
selecting, by the server, the two or more rowsets for compaction based on the two or more rowsets resulting in the minimum operational cost over the each rowset included in a workspace.
5 Assignments
0 Petitions
Accused Products
Abstract
A compaction policy imposing soft limits to optimize system efficiency is used to select various rowsets on which to perform compaction, each rowset storing keys within an interval called a keyspace. For example, the disclosed compaction policy results in a decrease in a height of the tablet, removes overlapping rowsets, and creates smaller sized rowsets. The compaction policy is based on the linear relationship shared between the keyspace height and the cost associated with performing an operation (e.g., an insert operation) in that keyspace. Accordingly, various factors determining which rowsets are to be compacted, how large the compacted rowsets are to be made, and when to perform the compaction, are considered within the disclosed compaction policy. Furthermore, a system and method for performing compaction on the selected datasets in a log-structured database is also provided.
-
Citations
20 Claims
-
1. A server-implemented method of selecting rowsets to compact in a tablet storing data associated with a distributed file system (DFS), the method comprising:
-
restricting, by a server, each rowset in a plurality of rowsets included in the tablet to have a size less than a predetermined size threshold, wherein the each rowset stores keys covering a range that is less than or equal to a keyspace of the tablet; determining a height of the tablet across the keyspace, wherein the height of the tablet is based on a number of rowsets having key ranges that overlap; determining a rowset width of the each rowset in the keyspace of the tablet, wherein the rowset width of the each rowset is proportional to a percentage of the keyspace that is covered by the rowset; until a minimum operational cost is reached, iteratively calculating, by the server, an operational cost associated with compaction of two or more rowsets in the keyspace, wherein the cost is calculated by integrating the rowset width of the each rowset included in the tablet across the height of the tablet, wherein the operational cost depends on a size of the each rowset; and selecting, by the server, the two or more rowsets for compaction based on the two or more rowsets resulting in the minimum operational cost over the each rowset included in a workspace. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An apparatus executing a method of selecting rowsets to compact in a tablet storing data associated with a distributed file system (DFS), the method comprising:
-
restricting, by a server, each rowset in a plurality of rowsets included in the tablet to have a size less than a predetermined size threshold, wherein the each rowset stores keys covering a range that is less than or equal to a keyspace of the tablet; determining a height of the tablet across the keyspace, wherein the height of the tablet is based on a number of rowsets having key ranges that overlap; determining a rowset width of the each rowset in the keyspace of the tablet, wherein the rowset width of the each rowset is proportional to a percentage of the keyspace that is covered by the rowset; until a minimum operational cost is reached, iteratively calculating, by the server, an operational cost associated with compaction of two or more rowsets in the keyspace, wherein the cost is calculated by integrating the rowset width of the each rowset included in the tablet across the height of the tablet, wherein the operational cost depends on the size of the each rowset; and selecting, by the server, the two or more rowsets for compaction based on the two or more rowsets resulting in the minimum operational cost over the each rowset included in a workspace. - View Dependent Claims (14, 15)
-
-
16. A non-transitory computer-readable storage medium storing a set of instructions that, when executed by a computer system implements a method of selecting rowsets to compact in a tablet storing data associated with a distributed file system (DFS), the method comprising:
-
restricting, by a server, each rowset in a plurality of rowsets included in the tablet to have a size less than a predetermined size threshold, wherein the each rowset stores keys covering a range that is less than or equal to a keyspace of the tablet; determining a height of the tablet across the keyspace, wherein the height of the tablet is based on a number of rowsets having key ranges that overlap; determining a rowset width of the each rowset in the keyspace of the tablet, wherein the rowset width of the each rowset is proportional to a percentage of the keyspace that is covered by the rowset; until a minimum operational cost is reached, iteratively calculating, by the server, an operational cost associated with compaction of two or more rowsets in the keyspace, wherein the cost is calculated by integrating the rowset width of the each rowset included in the tablet across the height of the tablet, wherein the operational cost depends on a size of the each rowset; and selecting, by the server, the two or more rowsets for compaction based on the two or more rowsets resulting in the minimum operational cost over the each rowset included in a workspace. - View Dependent Claims (17, 18, 19, 20)
-
Specification