Dwarf cube architecture for reducing storage sizes of multidimensional data
First Claim
Patent Images
1. A computer-implemented method for increasing the storage efficiency of either (1) a computer main memory or (2) a computer-readable disk memory, wherein said method comprises:
- analysing an initial datacube of said data;
identifying suffix redundancies in said datacube;
producing a coalesced and reduced-in-size datacube based on the identified suffix redundencies that is capable of answering a query with the full precision of the initial datacube; and
, storing said data in either said memory (1) or (2) as said reduced-in-size datacube to decrease the size of the memory required to store said data and increase the store efficiency of such memory.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention relates to data warehouses and the ability to create and maintain data cubes of multi-dimensional data. More specifically, the invention pertains to data cube architectures that permit significant reduction of storage, exhibit very efficient retrieval and provide a very efficient incremental update of the data cubes.
-
Citations
49 Claims
-
1. A computer-implemented method for increasing the storage efficiency of either (1) a computer main memory or (2) a computer-readable disk memory, wherein said method comprises:
- analysing an initial datacube of said data;
identifying suffix redundancies in said datacube;
producing a coalesced and reduced-in-size datacube based on the identified suffix redundencies that is capable of answering a query with the full precision of the initial datacube; and
, storing said data in either said memory (1) or (2) as said reduced-in-size datacube to decrease the size of the memory required to store said data and increase the store efficiency of such memory. - View Dependent Claims (2, 48, 49)
- analysing an initial datacube of said data;
-
3. A computer-implemented method for increasing the storage efficiency of a computer-readable storage medium, wherein said method comprises the steps of:
-
(A) sampling data to be stored to estimate cardinalities or correlations between dimensions of said data and ordering the dimensions accordingly; (B) sorting said data according to the dimension ordering acquired by step (A); (C) assigning one level of a Dwarf structure, moving top-down; (1) for a full data cube, to each hierarchy level of the dimensions, according to the dimension ordering, and the hierarchy-level ordering within each dimension; and (2) for a concatenated rollup cube, to each dimension, according to the dimension ordering;
wherein each level consists of multiple rollup representations whose number is equal to the number of hierarchy levels of the dimension;
wherein exactly one rollup representation is assigned to each hierarchy level of the dimension;
wherein the rollup representations are ordered from the one corresponding to the most detailed hierarchy level to the one corresponding to the least detailed hierarchy level;(D) inserting data in nodes one tuple at a time; and (E) storing said data in said computer-readable storage medium as a reduced-in-size datacube to decrease the amount of computer memory needed to store said data and increase the store e efficiency of such memory. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A data storage device comprising a computer-readable store e medium in which data has been stored in a Dwarf data structure to increase the storage efficiency of said medium, said structure comprising:
-
(A) a full data cube, wherein the structure of said full data cube contains as many levels as the sum of all the hierarchy levels of all the dimensions of the stored data;
wherein;(1) exactly one level of the structure is assigned to each hierarchy level of the dimensions; and (2) nodes at each level contain cells;
wherein each cell of a non-leaf node consists of a key value and a pointer to a node of the next lower level; and
wherein each cell of a leaf-node comprises a key value and the desired aggregate values; and
wherein each node also contains a special ALL cell, the ALL cell corresponding to all the keys of the node;
wherein ALL cells of non-leaf nodes contain pointers to the next lower level, and ALL cells of leaf nodes contain aggregate values;or (B) a concatenated rollup datacube, wherein the structure of said concatenated rollup datacube contains as many levels as the number of the dimensions of the stored data; and
wherein;(1) exactly one level of the structure is assigned to each dimension; (2) each level consists of multiple “
rollup representations”
, whose number is equal to the number of hierarchy levels of the dimension;
wherein exactly one rollup representation is assigned to each hierarchy level of the dimension;
wherein the rollup representations are ordered from the one corresponding to the most detailed hierarchy level to the one corresponding to the least detailed hierarchy level; and(3) nodes at each level contain cells;
wherein each cell of a non-leaf node consists of a key value and a pointer to a node of the next lower level; and
wherein each cell of a leaf-node comprises a key value and the desired aggregate values; and
wherein each node also contains a special ALL cell, the ALL cell corresponding to all the keys of the node;
wherein ALL cells of nodes in the last rollup representation contain aggregate values of the stored data when they belong to the last level, or a pointer to a node at the next level otherwise; and
wherein ALL cells of nodes not in the last rollup representation contain pointers to a node in the next rollup representation of the current levels;wherein said Dwarf data structure reduces the amount of computer-readable storage medium required to store said data, relative to the amount of computer-readable storage medium that would otherwise be required to store said data, thereby increasing the storage efficiency of said computer-readable store e medium. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
-
26. A computer comprising:
-
(A) a computer-readable storage medium in which data has been stored in a Dwarf data structure to increase the storage efficiency of said medium; and (B) means for analysing the stored data; (1) to estimate the cardinality of each dimension of said data and order the estimated dimensions according to decreasing cardinalities; (2) to calculate group-bys that aggregate values across at least one hierarchy level by merging previously calculated aggregate values; (3) to organize key values of cells within a node in a sorted list, which becomes a B+-tree if the number of keys exceeds 2 disk pages; (4) to organize different sets of group-bys that can be calculated from the same input data and which, therefore, contain the same aggregate values; and (5) to coalesce the store of different sets of group-bys that are identified in (4) thus eliminating suffix redundancy of said different sets of group-bys that are identified in (4); wherein said Dwarf data structure and said means reduce the amount of computer-readable storage medium required to store said data, relative to the amount of computer-readable storage medium that would otherwise be required to store said data, thereby increasing, the storage efficiency of said computer. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
SuffixCoalesceWithSort, HybridSuffixCoalesce or HierarchiesCoalesce algorithm by estimating whether the sub-dwarfs to be merged are currently stored in the system'"'"'s buffers. -
39. The computer of claim 26, wherein for each non-leaf node N at level i of a D-level Dwarf, the aggregate value V obtained by following from node N ALL cells until V is reached can be copied to the ALL cell of node N.
-
40. The computer of claim 39, wherein the Dwarf structure is queried, and the aggregate values for any specified group-by are retrieved.
-
41. The computer of claim 26, wherein the Dwarf structure is bulk-updated by using a merge-pack algorithm.
-
42. The computer of claim 26, wherein the Dwarf structure is updated by using an incremental update algorithm.
-
43. The computer of claim 26, wherein the Dwarf structure is updated by using a Dwarf reconstruction algorithm.
-
-
44. A computer-implemented method for increasing the efficiency of retrieving the aggregate values that correspond to any specified group-by of data stored in a computer-readable medium, wherein said method comprises the steps of:
-
(A) querying data that has been stored in said computer-readable medium in a Dwarf data structure formed by; (1) analysing data to estimate the cardinality of each dimension of said data and ordering the estimated dimensions according to decreasing cardinalities; (2) calculating group-bys that aggregate values across at least one hierarchy level by merging previously calculated aggregate values; (3) organizing key values of cells within a node in a sorted list, which becomes a B+-tree if the number of keys exceeds 2 disk pages; (4) organizing different sets of group-bys that can be calculated from the same input data and which, therefore, contain the same aggregate values; (5) coalescing the store of different sets of group-bys that are identified in (4), thus eliminating suffix redundancy of said different sets of group-bys that are identified in (4); and (6) employing said coalesced store of different sets of group-bys to form said Dwarf data structure; and (B) retrieving the aggregate values that correspond to any specified group-by of stored data; wherein said Dwarf data structure reduces the amount of computer-readable storage medium required to store said data relative to the amount of computer-readable store medium that would otherwise be required to store said data, thereby increasing the efficiency of retrieving said aggregate values that correspond to any specified group-by of stored data.
-
-
45. A computer-implemented method for increasing the efficiency of updating old data stored in a computer-readable storage medium, wherein said data is stored in an old Dwarf datacube structure, wherein said method comprises the steps of:
-
(A) creating a Delta-Dwarf datacube structure for the update data, by; (1) sampling input data to estimate the cardinality of each dimension and orders the dimensions according to decreasing cardinalities; (2) calculating group-bys that aggregate values across at least one hierarchy level by merging previously calculated aggregate values; (3) organizing key values of cells within a node in a sorted list, which becomes a B+-tree if the number of keys exceeds 2 disk pages and coalescing the store of organized different sets of group-bys thereby eliminating their suffix redundancy; and (4) identifying different sets of group-bys that can be calculated from the same input data and which contain the same aggregate values; and (B) merging the Delta-Dwarf datacube structure with the old Dwarf datacube structure using a merge-packing algorithm; wherein by creating said Delta-Dwarf datacube structure and merging its content with that of said old Dwarf datacube structure, said method increases the efficiency of data updating, relative to the time that would otherwise be required to update said data.
-
-
46. A computer-implemented method for increasing the efficiency of performing an incremental update to data stored in a computer-readable medium, wherein said data is stored in an existing Dwarf datacube structure, and wherein said method comprises the steps of:
-
(1) Ordering the dimensions of the update data according to the dimension ordering of said existing Dwarf datacube; (2) Traversing the old Dwarf structure top-down to identify nodes that need to be updated due to the existence of update tuples;
wherein a node at a level L of the Dwarf structure needs to be updated if and only if at least one update tuple contains a prefix of length L-1 that is identical to the path followed from the root of the structure to the current node, and wherein each node also contains a special ALL cell, the ALL cell corresponding to all the keys of the node;(3) Updating any node N at the lowest level of the structure by; (a) Identifying whether the update tuples that influence the aggregate values stored in N will require the insertion of new key values in N; and (b) Creating a new node to store the results if the process in step (a) shows that new keys need to be inserted to N, otherwise storing the results in N; (c) For each key in the resulting node, merging the aggregate values existing in N with those of the update tuples; (d) Calculating the aggregate values for the ALL cell; and (4) Updating any node N at higher levels by a method comprised of; (a) Recursively propagating the update procedure to the nodes pointed by the cells of N; and (b) Merging the resulting updated dwarfs to calculate the ALL cell of the node; to thereby accomplish the incremental updating of said existing Dwarf datacube structure; wherein by updating said data in said Dwarf data structure, the amount of computer-readable storage medium required to update said data is reduced, relative to the amount of computer-readable storage medium that would otherwise be required to update said data, thereby increasing the efficiency of performing said incremental update of said data.
-
-
47. A computer-implemented method for increasing data updating efficiency by updating an existing Dwarf datacube structure stored in a computer-readable storage medium into a new Dwarf datacube structure, wherein said method comprises causing a computer to implement a reconstruct algorithm comprised of the steps of:
-
(A) Extracting the tuples of the old Dwarf by performing a query on the most detailed hierarchy level of each dimension, requesting all possible values; (B) Ordering the data in the update tuples according to the dimension ordering in the existing Dwarf; (C) Merging the tuples acquired from steps (A) and (B); and (D) Employing said merged tuples to construct a new Dwarf datacube structure; and (E) Storing said new Dwarf datacube structure in a computer-readable storage medium, to thereby accomplish the incremental updating of said existing Dwarf datacube structure; wherein said method increases the efficiency of performing said update of said data, relative to the efficiency of updating said data if not stored in a Dwarf data cube structure.
-
Specification