Method and apparatus for loading data into a cube forest data structure
First Claim
1. A device for updating a cube forest F, which is a collection of indices I1, . . . , In having a plurality of templates T1, . . . , Tn, each of which template is a tree having a plurality of spines with a plurality of nodes, and the plurality of nodes of the tree represent aggregate values to be updated with a single tuple, comprising:
- a) means for forming a catenated key for an index, as determined by a sequence of template nodes on a spine of the index;
b) means for descending the index using a B-tree search algorithm and for searching for the catenated key, including;
(i) means for searching, at every node that the descent touches, for an effective leaf that is tagged by a subkey of the catenated key;
(ii) means for updating, if such an effective leaf is found, any aggregates at the effective leaf;
(iii) means for updating recursively any subindices at the effective leaf; and
(iv) means for marking the subkey as processed;
c) means for inserting, after the descent, if there is an unprocessed subkey, the catenated key into the index;
d) means for creating a plurality of effective leaves for all unprocessed subkeys and inserting them into the node;
e) means for restructuring, if the node becomes too full after inserting, the index using a B-tree restructuring algorithms; and
f) means for moving, after each restructuring step, effective leaves as necessary to ensure that the effective leaf location invariant is preserved.
1 Assignment
0 Petitions
Accused Products
Abstract
A device and method is disclosed for loading data into and updating a data structure known as a cube forest for use in a batch-load-then-read-intensively system. The device and method perform loading and updating functions efficiently. Hierarchically split cube forests provide a method for efficiently duplicating information, and can be optimized to reduce update and storage costs. Cube forests including hierarchically split cube forests are most appropriate for read-intensive, update-rarely-and-in-large-batches multidimensional applications in an off-the-shelf (low cost) hardware environment. A method and an apparatus for loading data into and updating a cube forest are disclosed herein.
85 Citations
19 Claims
-
1. A device for updating a cube forest F, which is a collection of indices I1, . . . , In having a plurality of templates T1, . . . , Tn, each of which template is a tree having a plurality of spines with a plurality of nodes, and the plurality of nodes of the tree represent aggregate values to be updated with a single tuple, comprising:
-
a) means for forming a catenated key for an index, as determined by a sequence of template nodes on a spine of the index;
b) means for descending the index using a B-tree search algorithm and for searching for the catenated key, including;
(i) means for searching, at every node that the descent touches, for an effective leaf that is tagged by a subkey of the catenated key;
(ii) means for updating, if such an effective leaf is found, any aggregates at the effective leaf;
(iii) means for updating recursively any subindices at the effective leaf; and
(iv) means for marking the subkey as processed;
c) means for inserting, after the descent, if there is an unprocessed subkey, the catenated key into the index;
d) means for creating a plurality of effective leaves for all unprocessed subkeys and inserting them into the node;
e) means for restructuring, if the node becomes too full after inserting, the index using a B-tree restructuring algorithms; and
f) means for moving, after each restructuring step, effective leaves as necessary to ensure that the effective leaf location invariant is preserved.
-
-
2. A device for loading a batch of tuples into a cube forest F, which cube forest F is a collection of indices I1, . . . , In having a plurality of templates T1, . . . , Tn, each of which template is a tree having a plurality of nodes, and the plurality of nodes of the tree represent aggregate values to be updated with the batch of tuples, comprising a processor programmed to perform the steps of:
-
a) forming a catenated key for an index, as determined by a sequence of template nodes on a spine of the index;
b) sorting a batch on the catenated key;
c) descending the index using a B-tree search algorithm, and searching for the catenated key by the substeps of;
(i) searching, at every node that the descent touches, for an effective leaf that is not marked processed and that is tagged by a subkey of the catenated key;
(ii) marking, if such an effective leaf is found, the subkey as processed;
(iii) marking the effective leaf as found; and
(iv) recording a location of the effective leaf;
d) inserting, after the descent, if there is an unprocessed subkey, the catenated key into the index;
e) creating a plurality of effective leaves for all unprocessed subkeys and inserting them into the node;
f) performing steps b)(i)-(iv) on the newly created effective leaves;
g) restructuring, if the node becomes too full after any insertions, the index using a B-tree restructuring algorithm;
h) moving, after each restructuring step, effective leaves as necessary to ensure that an effective leaf location invariant is preserved;
i) recording, if one of the marked effective leaves moves, its new location; and
j) performing the following substeps, if this is not the last tuple in the batch;
(i) forming the subkeys for the next tuple; and
(ii) performing the following substeps for each subkey from the next batch that is not identical with the current subkey;
(1) updating any aggregates with a value attribute of all tuples with the same subkey as the subkey that tags the effective leaf; and
(2) updating any subindices, and passing as the batch of tuples to insert all the tuples with same subkey as the subkey that tags the effective leaf.
-
-
3. A method for loading a single tuple into a cube forest F, which cube forest F is a collection of indices I1, . . . , In having a plurality of templates T1, . . . , Tn, each of which template is a tree having a plurality of nodes, and the plurality of nodes of the tree represent aggregate values to be updated with a single tuple, comprising the steps of:
-
a) inserting the single tuple into each one of the indices according to step b) repeated n times; and
b) inserting the single tuple into each index, Ii, by the following substeps;
(i) partitioning the tree T into a plurality of spines, wherein each spine defines a conventional index on a catenated key defined by a subset of nodes of the plurality of nodes of the tree, which subset of nodes are located on the spine;
(ii) recording an aggregate value and/or a sub-index at every node on the spine, wherein each node of the subset of nodes on the spine is represented by an effective leaf, which is tagged by a subkey; and
(iii) upon reaching an effective leaf for the single tuple, updating the aggregate value and if the effective leafhas a plurality of subindices, recursively inserting the single tuple into the plurality of subindices.
-
-
4. A method for loading a batch of tuples into a cube forest F, which cube forest F is a collection of indices I1, . . . , In having a plurality of templates T1, . . . , Tn, each of which is a tree having a plurality of nodes, and the plurality of nodes of the tree represent aggregate values to be updated with the batch of tuples, comprising the steps of:
-
a) forming a catenated key for an index, as determined by a sequence of template nodes on a spine of the index;
b) sorting a batch on the catenated key;
c) descending the index using a B-tree search algorithm, and searching for the catenated key by the substeps of;
(i) searching, at every node that the descent touches, for an effective leaf that is not marked processed and that is tagged by a subkey of the catenated key;
(ii) marking, if such an effective leaf is found, the subkey as processed;
(iii) marking the effective leaf as found; and
(iv) recording a location of the effective leaf;
d) inserting, after the descent, if there is an unprocessed subkey, the catenated key into the index;
e) creating a plurality of effective leaves for all unprocessed subkeys and inserting them into the node;
f) performing steps b)(i)-(iv) on the newly created effective leaves;
g) restructuring, if the node becomes too full after any insertions, the index using a B-tree restructuring algorithm;
h) moving, after each restructuring step, effective leaves as necessary to ensure that an effective leaf location invariant is preserved;
i) recording, if one of the marked effective leaves moves, its new location; and
j) performing the following substeps, if this is not the last tuple in the batch;
(i) forming the subkeys for the next tuple; and
(ii) performing the following substeps for each subkey from the next batch that is not identical with the current subkey;
(1) updating any aggregates with a value attribute of all tuples with the same subkey as the subkey that tags the effective leaf; and
(2) updating any subindices, and passing as the batch of tuples to insert all the tuples with same subkey as the subkey that tags the effective leaf. - View Dependent Claims (5, 6, 7, 8, 9, 10)
-
-
11. A method for updating a cube forest F, which is a collection of indices I1, . . . , In having a plurality of templates T1, . . . , Tn, each of which template is a tree having a plurality of spines with a plurality of nodes, and the plurality of nodes of the tree represent aggregate values to be updated with a single tuple, comprising the steps of:
-
a) forming a catenated key for an index, as determined by a sequence of template nodes on a spine of the index;
b) descending the index using a B-tree search algorithm and searching for the catenated key by the substeps of;
(i) searching, at every node that the descent touches, for an effective leaf that is tagged by a subkey of the catenated key;
(ii) updating, if such an effective leaf is found, any aggregates at the effective leaf;
(iii) updating recursively any subindices at the effective leaf; and
(iv) marking the subkey as processed;
c) inserting, after the descent, if there is an unprocessed subkey, the catenated key into the index;
d) creating a plurality of effective leaves for all unprocessed subkeys and inserting them into the node;
e) performing steps b)(i)-(iv) on the newly created effective leaves;
f) restructuring, if the node becomes too full after inserting in step c), the index using a B-tree restructuring algorithm; and
g) moving, after each restructuring step, effective leaves as necessary to ensure that the effective leaf location invariant is preserved. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
Specification