Database system with methods providing high-concurrency access in B-Tree structures
First Claim
1. In a database system, said database system storing a plurality of data records as a data table having an index, said index comprising a B-Tree having a root page, a plurality of internal pages, and a plurality of leaf-level pages, each page storing one or more key values from said data records, a method for providing highly-concurrent access to the B-Tree for multiple clients, the method comprising:
- (a) receiving a request from a client which requires insertion of a new key value into the B-Tree in order to satisfy the request;
(b) traversing the B-Tree for locating an existing page in the B-Tree appropriate for storing the new key value; and
(c) if said existing page does not contain sufficient room for storing the new key value, splitting the existing page into two pages by;
(i) allocating a new page at a level in the B-Tree which is the same as the existing page and marking both pages as undergoing a split, (ii) moving some of the key values from the existing page to the new page, and (iii) creating a reference in the existing page which points to the new page, so that any client which is traversing the B-Tree will not be blocked by the split which is currently occurring.
1 Assignment
0 Petitions
Accused Products
Abstract
A Client/Server Database System with improved methods for providing access to highly-concurrent data, such as of B-Tree data structures, is described. When the system receives a request to insert a key value into a B-Tree at page that does not have sufficient room, the system must split at the tree at the leaf level. This is done by allocating a new page, and moving some of the key values from the old page to the new page. The split itself propagates upward. To do the split itself, the system obtains address locks for the two pages, and marks both as undergoing “split” (i.e., a split operation)—the system sets a Boolean flag or “split bit” to “true.” When the split is propagated up, a “side entry” is added to the old page to point to the newly allocated page. The old page, however, may not have sufficient room for storing this new entry (e.g., when it is already full). Accordingly in such a case, the parent page must split also. This is done by allocating a new page, at that level. Both pages, old and new parents, are marked as undergoing split. As before, the system obtains address locks for these pages as well. At this point, a side entry is created in the old parent page. This information serves to let any client which is searching for key value (or greater) know that, instead of going directly down the tree from the old parent page, it should proceed to the parent'"'"'s new neighboring page. In this manner, a client'"'"'s traversal down the tree is not blocked by the split which is currently active. In effect, the client knows how to take a detour to arrive ultimately at the proper leaf-level page. After split propagation is complete, the system clears the split flags and releases address locks. Also at this point, the side entry is removed. Now, sufficient room now exists in the tree for inserting the key value.
-
Citations
30 Claims
-
1. In a database system, said database system storing a plurality of data records as a data table having an index, said index comprising a B-Tree having a root page, a plurality of internal pages, and a plurality of leaf-level pages, each page storing one or more key values from said data records, a method for providing highly-concurrent access to the B-Tree for multiple clients, the method comprising:
-
(a) receiving a request from a client which requires insertion of a new key value into the B-Tree in order to satisfy the request;
(b) traversing the B-Tree for locating an existing page in the B-Tree appropriate for storing the new key value; and
(c) if said existing page does not contain sufficient room for storing the new key value, splitting the existing page into two pages by;
(i) allocating a new page at a level in the B-Tree which is the same as the existing page and marking both pages as undergoing a split, (ii) moving some of the key values from the existing page to the new page, and (iii) creating a reference in the existing page which points to the new page, so that any client which is traversing the B-Tree will not be blocked by the split which is currently occurring. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
(d) storing the new key value in the B-Tree and, thereafter, marking both pages as no longer undergoing a split and clearing the reference in the existing page which points to the new page.
-
-
4. The method of claim 1, wherein any client which is traversing the B-Tree while the split is occurring, in search of a key value which is less than that stored by the existing page, will traverse directly to an appropriate lower-level page in the B-Tree, for arriving at a proper leaf-level page.
-
5. The method of claim 1, wherein said split occurs at a point which is at mid point for the page.
-
6. The method of claim 1, wherein each page includes a corresponding split flag for indicating whether the page is currently undergoing a split.
-
7. The method of claim 1, wherein each split flag comprises a bit flag which is set when the corresponding page is undergoing a split and reset upon completion of the split.
-
8. The method of claim 1, wherein the existing page has a parent page and wherein the method further comprises:
propagating the split upward to a next-higher level of the B-Tree, so that the parent of the existing page adds appropriate new key value information for allowing clients to traverse to the new page when appropriate.
-
9. The method of claim 8, further comprising:
if the parent of the existing page does not contain sufficient room for storing new key value information, further propagating the split upward through the B-Tree.
-
10. The method of claim 1, wherein said reference created in the existing page comprises a side-entry bit which is set in the existing page for indicating whether a side-entry exists in the page, said side-entry allowing a client traversing the B-Tree to check whether to traverse to the new page.
-
11. The method of claim 1, wherein said split is performed within a database transaction.
-
12. The method of claim 1, wherein said split is performed as a nested top action within a particular database transaction, such that the action can commit or rollback independent of the particular database transaction.
-
13. The method of claim 1, wherein said step of splitting the page into two pages includes:
asserting an address lock on a page while the page is undergoing a split.
-
14. The method of claim 1, wherein said other clients which are traversing the B-Tree at a page which is currently undergoing a split are restricted to read-only access.
-
15. The method of claim 1, wherein said index is a clustered index, and wherein said leaf-level pages of said B-Tree actually store said plurality of data records.
-
16. The method of claim 1, wherein said leaf-level pages of said B-Tree store identifier information allowing the system to uniquely identify a particular one of said data records.
-
17. The method of claim 1, further comprising:
inserting the new key value into the B-Tree by storing the new key value in the existing page once sufficient room for storing the new key value has been made available.
-
18. The method of claim 1, wherein said step of splitting includes:
-
determining a mid point in the existing page;
creating the new page by splitting the existing page into two pages; and
transferring to the new page key values which were stored in the existing page beyond the determined mid point.
-
-
19. The method of claim 1, wherein each page stores one or more key values together with pointers to children pages, said pointers indicating which path a client should traverse through the B-Tree to identify a record whose particular key value is being sought.
-
20. The method of claim 1, wherein said request from the client comprises a request to insert a new data record into the data table.
-
21. A data processing system comprising:
-
a database server connected to a plurality of clients, said database server storing a B-Tree data structure comprising a plurality of pages storing key values derived from a database table;
means for receiving a request from a first client to insert a new data record into said database table;
means for inserting a new key value from the new data record into an existing page of said B-Tree, said means including means for splitting the existing page into the existing page and a new page, said new page being at a level in the B-Tree which is the same as the existing page, so that insertion of the new key value may be accommodated; and
means for preserving concurrent access by creating an entry in the existing page which points to the new page, so that any other client which is traversing the B-Tree at the point of the split will not be blocked by the split while it is occurring, said means including a side-entry storing a key value allowing a client traversing the B-Tree to determine whether to traverse to the new page. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
means for marking both pages as undergoing a split.
-
-
25. The system of claim 21, wherein any client which is traversing the B-Tree while the split is occurring, in search of a key value which is less than that stored by the existing page, will traverse directly to an appropriate lower-level page in the B-Tree for arriving at a proper leaf-level page.
-
26. The system of claim 21, wherein said split occurs at a point which is at mid point for the page.
-
27. The system of claim 21, wherein each page includes a corresponding split flag for indicating whether the page is undergoing a split.
-
28. The system of claim 21, wherein each split flag comprises a bit flag which is set when the corresponding page is undergoing a split and reset upon completion of the split.
-
29. The system of claim 21, wherein the existing page has a parent page and wherein said means for preserving concurrent access includes:
means for propagating the split upward to a next-higher level of the B-Tree, so that the parent of the existing page adds appropriate new key value information for allowing clients to traverse to the new page when appropriate.
-
30. The system of claim 21, wherein said entry created in the existing page comprises a side-entry and a corresponding side-entry bit, said side-entry bit being set in the existing page for indicating whether a side-entry exists in the page, said side-entry storing a key value allowing a client traversing the B-Tree to determine whether to traverse to the new page.
Specification