Database systems and applications for assigning records to chunks of a partition in a non-relational database system with auto-balancing
First Claim
1. A method for performing an auto-balancing operation in a partition of hardware-based network storage of a non-relational database system that comprises a plurality of partitions the method comprising:
- assigning a plurality of records having a common attribute for grouping and stored to a same partition of the plurality of partitions to a plurality of chunks within the same partition without affecting a location of records in the hardware-based network storage, wherein each chunk of the plurality of chunks comprises a respective unique group of records of the plurality of records and a unique chunk identifier that identifies that chunk within that same partition, wherein assigning the plurality of records comprises, for a particular record of the plurality of records;
mapping a respective record key uniquely identifying the particular record to a natural chunk identifier, wherein the respective record key comprises data from one or more fields of the particular record and the natural chunk identifier comprises a numerical value corresponding to the data from the one or more fields of the particular record;
assigning the particular record to a particular candidate chunk of the same partition having a particular chunk identifier (chunk_identifier(k)) that is a closest chunk available for insertion of the particular record at a particular time that satisfies an assignment formula;
chunk_identifier(k)≤
f(record key)<
chunk_identifier(k+1), where f(record key) corresponds to the natural chunk identifier determined by the mapping of the data from the one or more fields of the particular record, where k and k+1 are indices of two consecutive chunks and chunk identifiers of the plurality of chunks are in sorted order, wherein the particular candidate chunk of the same partition comprises the respective unique group of records sorted by their corresponding record keys;
determining a first chunk of the plurality of chunks should be split when a number of records in the first chunk of the same partition exceeds a particular threshold number when a new record is inserted into the partition and assigned to the first chunk according to the assignment formula; and
after determining the first chunk should be split;
determining a new chunk identifier for splitting the first chunk based at least in part on the respective record key uniquely identifying the new record and the respective unique group of records sorted by their corresponding record keys assigned to the first chunk; and
updating the respective chunk identifier associated with a first subset of the number of records that were originally part of the first chunk to the new chunk identifier to assign the first subset of the number of records to a new chunk without affecting the location of the records in the hardware-based network storage, wherein a second subset of the number of records originally assigned to the first chunk of the same partition remain assigned to the first chunk such that the number of records originally assigned to the first chunk are divided among the first chunk of the same partition and the new chunk of the same partition such that the assignment formula is satisfied after the updating is complete, wherein the respective unique group of records in each chunk are processible as a unit without iterating through all records of the same partition.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system are provided for assigning a particular record into a chunk of a partition within a non-relational database system. When a number of records in a particular candidate chunk is greater than a particular threshold number, an application performs an auto-balancing operation to split the particular candidate chunk such that records originally assigned to the particular candidate chunk are divided among the particular candidate chunk and a new chunk. Some of the number of records that were originally part of the particular candidate chunk are assigned to a new chunk and the other remaining ones of the number of records that were originally part of the particular candidate chunk remain assigned to the particular candidate chunk.
156 Citations
17 Claims
-
1. A method for performing an auto-balancing operation in a partition of hardware-based network storage of a non-relational database system that comprises a plurality of partitions the method comprising:
-
assigning a plurality of records having a common attribute for grouping and stored to a same partition of the plurality of partitions to a plurality of chunks within the same partition without affecting a location of records in the hardware-based network storage, wherein each chunk of the plurality of chunks comprises a respective unique group of records of the plurality of records and a unique chunk identifier that identifies that chunk within that same partition, wherein assigning the plurality of records comprises, for a particular record of the plurality of records; mapping a respective record key uniquely identifying the particular record to a natural chunk identifier, wherein the respective record key comprises data from one or more fields of the particular record and the natural chunk identifier comprises a numerical value corresponding to the data from the one or more fields of the particular record; assigning the particular record to a particular candidate chunk of the same partition having a particular chunk identifier (chunk_identifier(k)) that is a closest chunk available for insertion of the particular record at a particular time that satisfies an assignment formula;
chunk_identifier(k)≤
f(record key)<
chunk_identifier(k+1), where f(record key) corresponds to the natural chunk identifier determined by the mapping of the data from the one or more fields of the particular record, where k and k+1 are indices of two consecutive chunks and chunk identifiers of the plurality of chunks are in sorted order, wherein the particular candidate chunk of the same partition comprises the respective unique group of records sorted by their corresponding record keys;determining a first chunk of the plurality of chunks should be split when a number of records in the first chunk of the same partition exceeds a particular threshold number when a new record is inserted into the partition and assigned to the first chunk according to the assignment formula; and after determining the first chunk should be split; determining a new chunk identifier for splitting the first chunk based at least in part on the respective record key uniquely identifying the new record and the respective unique group of records sorted by their corresponding record keys assigned to the first chunk; and updating the respective chunk identifier associated with a first subset of the number of records that were originally part of the first chunk to the new chunk identifier to assign the first subset of the number of records to a new chunk without affecting the location of the records in the hardware-based network storage, wherein a second subset of the number of records originally assigned to the first chunk of the same partition remain assigned to the first chunk such that the number of records originally assigned to the first chunk are divided among the first chunk of the same partition and the new chunk of the same partition such that the assignment formula is satisfied after the updating is complete, wherein the respective unique group of records in each chunk are processible as a unit without iterating through all records of the same partition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory, computer-readable medium containing instructions thereon, which, when executed by a processor, are configurable to cause the process to perform operations comprising:
-
assigning a plurality of records having a common attribute for grouping and stored to a same partition of hardware-based network storage of a non-relational database system that comprises a plurality of partitions to a plurality of chunks without affecting a location of records in the hardware-based network storage, wherein the partition comprises the plurality of chunks each having a unique chunk identifier that identifies that chunk within that same partition and each chunk of the plurality of chunks comprises a respective unique group of records of the plurality of records, wherein assigning the plurality of records comprises, for a particular record of the plurality of records; mapping a respective record key uniquely identifying the particular record to a natural chunk identifier, wherein the respective record key comprises data from one or more fields of the particular record and the natural chunk identifier comprises a numerical value corresponding to the data from the one or more fields of the particular record; assigning the particular record to a particular candidate chunk of the same partition having a particular chunk identifier (chunkidentifier(k)) that is a closest chunk available for insertion of the particular record at a particular time that satisfies an assignment formula;
chunk_identifier(k)≤
f(record key)<
chunk_identifier(k+1), where f(record key) corresponds to the natural chunk identifier determined by the mapping of the data from the one or more fields of the particular record, where k and k+1 are indices of two consecutive chunks and chunk identifiers of the plurality of chunks are in sorted order, wherein the particular candidate chunk of the same partition comprises the respective unique group of records sorted by their corresponding record keys;determining a first chunk of the plurality of chunks should be split when a number of records in the first chunk of the same partition exceeds a particular threshold number when a new record is inserted into the partition and assigned to the first chunk according to the assignment formula; and after determining the first chunk should be split; determining a new chunk identifier for splitting the first chunk based at least in part on the respective record key uniquely identifying the new record and the respective unique group of records sorted by their corresponding record keys assigned to the first chunk; and updating the respective chunk identifier associated with a first subset of the number of records that were originally part of the first chunk to the new chunk identifier to assign the first subset of the number of records to a new chunk without affecting the location of the records in the hardware-based network storage, wherein a second subset of the number of records originally assigned to the first chunk of the same partition remain assigned to the first chunk such that the number of records originally assigned to the first chunk are divided among the first chunk of the same partition and the new chunk of the same partition such that the assignment formula is satisfied after the updating is complete, wherein the respective unique group of records in each chunk are processible as a unit without iterating through all records of the same partition. - View Dependent Claims (14, 15, 16)
-
-
17. A system, comprising:
-
a non-relational database system comprising; hardware-based network storage that comprises a plurality of partitions, wherein each partition comprises one or more chunks each having a unique chunk identifier that identifies that chunk within a same partition; and a database management system (DBMS) having a query interface and application programming interface for an application, and a database storage engine used to create, read, update and delete (CRUD) records at the hardware-based network storage; and
an application server, comprising;a hardware-based processing system configured to execute the application as a server process to generate a plurality of records having a respective record key that is an identifier that uniquely identifies a particular record, wherein the plurality of records have a common attribute for grouping and are stored to the same partition of the plurality of partitions, wherein the particular record is to be inserted into the non-relational database system, when the particular record is ready to be inserted into the same partition, wherein the application is configured to;
access the non-relational database system through the query interface and application programming interface for the application when the particular record is ready to be inserted into the partition; andwherein the application is configured to; assign the plurality of records to a plurality of chunks without affecting a location of records in the hardware-based network storage, wherein each chunk of the plurality of chunks comprises a respective unique group of records of the plurality of records and the unique chunk identifier that identifies that chunk within that same partition, wherein assigning the plurality of records comprises, for the particular record of the plurality of records; mapping a respective record key uniquely identifying the particular record to a natural chunk identifier, wherein the respective record key comprises data from one or more fields of the particular record and the natural chunk identifier comprises a numerical value corresponding to the data from the one or more fields of the particular record; assigning the particular record to a particular candidate chunk of the same partition having a particular chunk identifier (chunkidentifier(k)) that is a closest chunk available for insertion of the particular record at a particular time that satisfies an assignment formula;
chunk_identifier(k)≤
f(record key)<
chunk_identifier(k+1), where f(record key) corresponds to the natural chunk identifier determined by the mapping of the data from the one or more fields of the particular record, where k and k+1 are indices of two consecutive chunks and chunk identifiers of the plurality of chunks are in sorted order, wherein the particular candidate chunk of the same partition comprises the respective unique group of records sorted by their corresponding record keys;determine a first chunk of the plurality of chunks should be split when a number of records in the first chunk of the same partition exceeds a particular threshold number when a new record is inserted into the partition and assigned to the first chunk according to the assignment formula; and after determining the first chunk should be split; determining a new chunk identifier for splitting the first chunk based at least in part on the respective record key uniquely identifying the new record and the respective unique group of records sorted by their corresponding record keys assigned to the first chunk; and updating the respective chunk identifier associated with a first subset of the number of records that were originally part of the first chunk to the new chunk identifier to assign the first subset of the number of records to a new chunk without affecting the location of the records in the hardware-based network storage, wherein a second subset of the number of records originally assigned to the first chunk of the same partition remain assigned to the first chunk such that the number of records originally assigned to the first chunk are divided among the first chunk of the same partition and the new chunk of the same partition such that the assignment formula is satisfied after the updating is complete, wherein the respective unique group of records in each chunk are processible as a unit without iterating through all records of the same partition.
-
Specification