Database systems and applications for assigning records to chunks of a partition in a non-relational database system with auto-balancing

US 11,921,750 B2
Filed: 10/29/2018
Issued: 03/05/2024
Est. Priority Date: 10/29/2018
Status: Active Grant

First Claim

Patent Images

1. A method for performing an auto-balancing operation in a partition of hardware-based network storage of a non-relational database system that comprises a plurality of partitions the method comprising:

assigning a plurality of records having a common attribute for grouping and stored to a same partition of the plurality of partitions to a plurality of chunks within the same partition without affecting a location of records in the hardware-based network storage, wherein each chunk of the plurality of chunks comprises a respective unique group of records of the plurality of records and a unique chunk identifier that identifies that chunk within that same partition, wherein assigning the plurality of records comprises, for a particular record of the plurality of records;

mapping a respective record key uniquely identifying the particular record to a natural chunk identifier, wherein the respective record key comprises data from one or more fields of the particular record and the natural chunk identifier comprises a numerical value corresponding to the data from the one or more fields of the particular record;

assigning the particular record to a particular candidate chunk of the same partition having a particular chunk identifier (chunk_identifier(k)) that is a closest chunk available for insertion of the particular record at a particular time that satisfies an assignment formula;

chunk_identifier(k)≤

f(record key)<

chunk_identifier(k+1), where f(record key) corresponds to the natural chunk identifier determined by the mapping of the data from the one or more fields of the particular record, where k and k+1 are indices of two consecutive chunks and chunk identifiers of the plurality of chunks are in sorted order, wherein the particular candidate chunk of the same partition comprises the respective unique group of records sorted by their corresponding record keys;

determining a first chunk of the plurality of chunks should be split when a number of records in the first chunk of the same partition exceeds a particular threshold number when a new record is inserted into the partition and assigned to the first chunk according to the assignment formula; and

after determining the first chunk should be split;

determining a new chunk identifier for splitting the first chunk based at least in part on the respective record key uniquely identifying the new record and the respective unique group of records sorted by their corresponding record keys assigned to the first chunk; and

updating the respective chunk identifier associated with a first subset of the number of records that were originally part of the first chunk to the new chunk identifier to assign the first subset of the number of records to a new chunk without affecting the location of the records in the hardware-based network storage, wherein a second subset of the number of records originally assigned to the first chunk of the same partition remain assigned to the first chunk such that the number of records originally assigned to the first chunk are divided among the first chunk of the same partition and the new chunk of the same partition such that the assignment formula is satisfied after the updating is complete, wherein the respective unique group of records in each chunk are processible as a unit without iterating through all records of the same partition.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system are provided for assigning a particular record into a chunk of a partition within a non-relational database system. When a number of records in a particular candidate chunk is greater than a particular threshold number, an application performs an auto-balancing operation to split the particular candidate chunk such that records originally assigned to the particular candidate chunk are divided among the particular candidate chunk and a new chunk. Some of the number of records that were originally part of the particular candidate chunk are assigned to a new chunk and the other remaining ones of the number of records that were originally part of the particular candidate chunk remain assigned to the particular candidate chunk.

156 Citations

17 Claims

1. A method for performing an auto-balancing operation in a partition of hardware-based network storage of a non-relational database system that comprises a plurality of partitions the method comprising:
- assigning a plurality of records having a common attribute for grouping and stored to a same partition of the plurality of partitions to a plurality of chunks within the same partition without affecting a location of records in the hardware-based network storage, wherein each chunk of the plurality of chunks comprises a respective unique group of records of the plurality of records and a unique chunk identifier that identifies that chunk within that same partition, wherein assigning the plurality of records comprises, for a particular record of the plurality of records;
  
  mapping a respective record key uniquely identifying the particular record to a natural chunk identifier, wherein the respective record key comprises data from one or more fields of the particular record and the natural chunk identifier comprises a numerical value corresponding to the data from the one or more fields of the particular record;
  
  assigning the particular record to a particular candidate chunk of the same partition having a particular chunk identifier (chunk_identifier(k)) that is a closest chunk available for insertion of the particular record at a particular time that satisfies an assignment formula;
  
  chunk_identifier(k)≤
  
  f(record key)<
  
  chunk_identifier(k+1), where f(record key) corresponds to the natural chunk identifier determined by the mapping of the data from the one or more fields of the particular record, where k and k+1 are indices of two consecutive chunks and chunk identifiers of the plurality of chunks are in sorted order, wherein the particular candidate chunk of the same partition comprises the respective unique group of records sorted by their corresponding record keys;
  
  determining a first chunk of the plurality of chunks should be split when a number of records in the first chunk of the same partition exceeds a particular threshold number when a new record is inserted into the partition and assigned to the first chunk according to the assignment formula; and
  
  after determining the first chunk should be split;
  
  determining a new chunk identifier for splitting the first chunk based at least in part on the respective record key uniquely identifying the new record and the respective unique group of records sorted by their corresponding record keys assigned to the first chunk; and
  
  updating the respective chunk identifier associated with a first subset of the number of records that were originally part of the first chunk to the new chunk identifier to assign the first subset of the number of records to a new chunk without affecting the location of the records in the hardware-based network storage, wherein a second subset of the number of records originally assigned to the first chunk of the same partition remain assigned to the first chunk such that the number of records originally assigned to the first chunk are divided among the first chunk of the same partition and the new chunk of the same partition such that the assignment formula is satisfied after the updating is complete, wherein the respective unique group of records in each chunk are processible as a unit without iterating through all records of the same partition.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method according to claim 1, wherein the natural chunk identifier is greater than or equal to a chunk identifier of a chunk at index k and less than a chunk identifier of another chunk at index k+1, and wherein a mapping function maps the record key to the particular chunk identifier, and wherein k and k+1 are indices of two consecutive chunks that are available at the time of insertion.
  - 3. The method according to claim 1, wherein determining the new chunk identifier for splitting the first chunk comprises:
    - performing a binary search by record key within the first chunk of the same partition to find the new record to serve as the split point for splitting the first chunk of the same partition into the first chunk of the same partition and the new chunk of the same partition.
  - 4. The method according to claim 3, wherein performing a binary search by record key within the first chunk of the same partition, comprises:
    - retrieving all records in the first chunk of the same partition having the natural chunk identifier;
      
      sorting all the records by record key; and
      
      determining records in the first chunk of the same partition where the first chunk of the same partition is to be split into a first half and a second half.
  - 5. The method according to claim 4, wherein determining the records in the first chunk of the same partition comprises:
    - determining, when the size (nk) of the chunk is even, that a middle two records where the first chunk of the same partition is to be split into the first half and the second half are a first record n and a second record n+1.
  - 6. The method according to claim 4, wherein determining the records in the first chunk of the same partition comprises:
    - determining, when the size (2n+1) of the chunk is odd, that the records where the first chunk of the same partition is to be split into the first half and the second half are a middle record n+1 and a record n+1 before the middle record.
  - 7. The method according to claim 1, wherein the non-relational database system is capable of supporting chunking of the records in the same partition so that auto-balancing functionality can be implemented at a query level within an application and is transparent to the hardware-based network storage, wherein the same partition is a collection of records that have a common query attribute for grouping within that same partition.
  - 8. The method according to claim 7, wherein the application is written to follow a particular database schema supported by a database management system (DBMS), wherein information needed to store a record in accordance with the particular database schema that the application is written to follow, comprises:
    - (1) a partition key, (2) a chunk identifier, (3) a record key, and (4) data associated with the record.
  - 9. The method of claim 1, wherein the respective record key comprises a string and the natural chunk identifier comprises a hash code the string is mapped to.
  - 10. The method of claim 1, wherein the respective record key comprises a date-time and the natural chunk identifier comprises a time-stamp the date-time is mapped to.
  - 11. The method of claim 1, wherein the data from the particular record includes at least one of a name, an ID, and a creation date-time.
  - 12. The method of claim 1, wherein mapping the respective record key uniquely identifying the particular record to the natural chunk identifier comprises hashing the data from the particular record into an integer.

13. A non-transitory, computer-readable medium containing instructions thereon, which, when executed by a processor, are configurable to cause the process to perform operations comprising:
- assigning a plurality of records having a common attribute for grouping and stored to a same partition of hardware-based network storage of a non-relational database system that comprises a plurality of partitions to a plurality of chunks without affecting a location of records in the hardware-based network storage, wherein the partition comprises the plurality of chunks each having a unique chunk identifier that identifies that chunk within that same partition and each chunk of the plurality of chunks comprises a respective unique group of records of the plurality of records, wherein assigning the plurality of records comprises, for a particular record of the plurality of records;
  
  mapping a respective record key uniquely identifying the particular record to a natural chunk identifier, wherein the respective record key comprises data from one or more fields of the particular record and the natural chunk identifier comprises a numerical value corresponding to the data from the one or more fields of the particular record;
  
  assigning the particular record to a particular candidate chunk of the same partition having a particular chunk identifier (chunkidentifier(k)) that is a closest chunk available for insertion of the particular record at a particular time that satisfies an assignment formula;
  
  chunk_identifier(k)≤
  
  f(record key)<
  
  chunk_identifier(k+1), where f(record key) corresponds to the natural chunk identifier determined by the mapping of the data from the one or more fields of the particular record, where k and k+1 are indices of two consecutive chunks and chunk identifiers of the plurality of chunks are in sorted order, wherein the particular candidate chunk of the same partition comprises the respective unique group of records sorted by their corresponding record keys;
  
  determining a first chunk of the plurality of chunks should be split when a number of records in the first chunk of the same partition exceeds a particular threshold number when a new record is inserted into the partition and assigned to the first chunk according to the assignment formula; and
  
  after determining the first chunk should be split;
  
  determining a new chunk identifier for splitting the first chunk based at least in part on the respective record key uniquely identifying the new record and the respective unique group of records sorted by their corresponding record keys assigned to the first chunk; and
  
  updating the respective chunk identifier associated with a first subset of the number of records that were originally part of the first chunk to the new chunk identifier to assign the first subset of the number of records to a new chunk without affecting the location of the records in the hardware-based network storage, wherein a second subset of the number of records originally assigned to the first chunk of the same partition remain assigned to the first chunk such that the number of records originally assigned to the first chunk are divided among the first chunk of the same partition and the new chunk of the same partition such that the assignment formula is satisfied after the updating is complete, wherein the respective unique group of records in each chunk are processible as a unit without iterating through all records of the same partition.
- View Dependent Claims (14, 15, 16)
- - 14. The computer-readable medium of claim 13, wherein the respective record key comprises a string and the natural chunk identifier comprises a hash code the string is mapped to.
  - 15. The computer-readable medium of claim 13, wherein the respective record key comprises a date-time and the natural chunk identifier comprises a time-stamp the date-time is mapped to.
  - 16. The computer-readable medium of claim 13, wherein:
    - mapping the respective record key uniquely identifying the particular record to the natural chunk identifier comprises hashing the data from the particular record into an integer.

17. A system, comprising:
- a non-relational database system comprising;
  
  hardware-based network storage that comprises a plurality of partitions, wherein each partition comprises one or more chunks each having a unique chunk identifier that identifies that chunk within a same partition; and
  
  a database management system (DBMS) having a query interface and application programming interface for an application, and a database storage engine used to create, read, update and delete (CRUD) records at the hardware-based network storage; and
  
  an application server, comprising;
  
  a hardware-based processing system configured to execute the application as a server process to generate a plurality of records having a respective record key that is an identifier that uniquely identifies a particular record, wherein the plurality of records have a common attribute for grouping and are stored to the same partition of the plurality of partitions, wherein the particular record is to be inserted into the non-relational database system, when the particular record is ready to be inserted into the same partition, wherein the application is configured to;
  
  access the non-relational database system through the query interface and application programming interface for the application when the particular record is ready to be inserted into the partition; and
  
  wherein the application is configured to;
  
  assign the plurality of records to a plurality of chunks without affecting a location of records in the hardware-based network storage, wherein each chunk of the plurality of chunks comprises a respective unique group of records of the plurality of records and the unique chunk identifier that identifies that chunk within that same partition, wherein assigning the plurality of records comprises, for the particular record of the plurality of records;
  
  mapping a respective record key uniquely identifying the particular record to a natural chunk identifier, wherein the respective record key comprises data from one or more fields of the particular record and the natural chunk identifier comprises a numerical value corresponding to the data from the one or more fields of the particular record;
  
  assigning the particular record to a particular candidate chunk of the same partition having a particular chunk identifier (chunkidentifier(k)) that is a closest chunk available for insertion of the particular record at a particular time that satisfies an assignment formula;
  
  chunk_identifier(k)≤
  
  f(record key)<
  
  chunk_identifier(k+1), where f(record key) corresponds to the natural chunk identifier determined by the mapping of the data from the one or more fields of the particular record, where k and k+1 are indices of two consecutive chunks and chunk identifiers of the plurality of chunks are in sorted order, wherein the particular candidate chunk of the same partition comprises the respective unique group of records sorted by their corresponding record keys;
  
  determine a first chunk of the plurality of chunks should be split when a number of records in the first chunk of the same partition exceeds a particular threshold number when a new record is inserted into the partition and assigned to the first chunk according to the assignment formula; and
  
  after determining the first chunk should be split;
  
  determining a new chunk identifier for splitting the first chunk based at least in part on the respective record key uniquely identifying the new record and the respective unique group of records sorted by their corresponding record keys assigned to the first chunk; and
  
  updating the respective chunk identifier associated with a first subset of the number of records that were originally part of the first chunk to the new chunk identifier to assign the first subset of the number of records to a new chunk without affecting the location of the records in the hardware-based network storage, wherein a second subset of the number of records originally assigned to the first chunk of the same partition remain assigned to the first chunk such that the number of records originally assigned to the first chunk are divided among the first chunk of the same partition and the new chunk of the same partition such that the assignment formula is satisfied after the updating is complete, wherein the respective unique group of records in each chunk are processible as a unit without iterating through all records of the same partition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Salesforce.com, Inc.
Original Assignee
Salesforce.com, Inc.
Inventors
Ho, Shan-Cheng
Primary Examiner(s)
Mackes, Kris E
Assistant Examiner(s)
Vuong, Cao D

Application Number

US16/173,057
Publication Number

US 20200134081A1
Time in Patent Office

1,954 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/211   Schema design and management

G06F 16/2272   Management thereof

G06F 16/252   between a Database Manageme...

G06F 16/278   Data partitioning, e.g. hor...

Database systems and applications for assigning records to chunks of a partition in a non-relational database system with auto-balancing

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

156 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Database systems and applications for assigning records to chunks of a partition in a non-relational database system with auto-balancing

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

156 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links