Dwarf cube architecture for reducing storage sizes of multidimensional data
First Claim
Patent Images
1. A method for structuring data for storage in either computer main memory or disk memory, which comprises analysing a datacube of said data to identify suffix redundancies in said data, and exploiting any identified suffix redundancies to reduce the size of a datacube needed to store said data.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention relates to data warehouses and the ability to create and maintain data cubes of multi-dimensional data. More specifically, the invention pertains to data cube architectures that permit significant reduction of storage, exhibit very efficient retrieval and provide a very efficient incremental update of the data cubes.
-
Citations
47 Claims
- 1. A method for structuring data for storage in either computer main memory or disk memory, which comprises analysing a datacube of said data to identify suffix redundancies in said data, and exploiting any identified suffix redundancies to reduce the size of a datacube needed to store said data.
-
3. A method for structuring data for storage in a computer or in a computer-readable storage medium, which comprises the steps of:
-
(A) sampling the data to estimate the cardinalities and or correlation between dimensions and ordering the dimensions accordingly;
(B) sorting the data according to the dimension ordering acquired by step (A);
(C) assigning one level of a Dwarf structure, moving top-down;
(1) for a full data cube, to each hierarchy level of the dimensions, according to the dimension ordering, and the hierarchy-level ordering within each dimension; and
(2) for a concatenated rollup cube, to each dimension, according to the dimension ordering;
wherein each level consists of multiple rollup representations whose number is equal to the number of hierarchy levels of the dimension;
wherein exactly one rollup representation is assigned to each hierarchy level of the dimension;
wherein the rollup representations are ordered from the one corresponding to the most detailed hierarchy level to the one corresponding to the least detailed hierarchy level; and
(D) inserting data in nodes one tuple at a time. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A data storage device comprising a Dwarf data structure, said structure comprising:
-
(A) A full data cube, wherein the structure of said full data cube contains as many levels as the sum of all the hierarchy levels of all the dimensions of the stored data;
wherein;
(1) exactly one level of the structure is assigned to each hierarchy level of the dimensions;
(2) nodes at each level contain cells;
wherein each cell of a non-leaf node consists of a key value and a pointer to a node of the next lower level; and
wherein each cell of a leaf-node comprises a key value and the desired aggregate values; and
wherein each node also contains a special ALL cell, the ALL cell corresponding to all the keys of the node;
wherein ALL cells of non-leaf nodes contain pointers to the next lower level, and ALL cells of leaf nodes contain aggregate values;
or (B) a concatenated rollup datacube, wherein the structure of said concatenated rollup datacube contains as many levels as the number of the dimensions of the stored data; and
wherein;
(1) exactly one level of the structure is assigned to each dimension;
(2) each level consists of multiple “
rollup representations”
, whose number is equal to the number of hierarchy levels of the dimension;
wherein exactly one rollup representation is assigned to each hierarchy level of the dimension;
wherein the rollup representations are ordered from the one corresponding to the most detailed hierarchy level to the one corresponding to the least detailed hierarchy level;
(3) nodes at each level contain cells;
wherein each cell of a non-leaf node consists of a key value and a pointer to a node of the next lower level; and
wherein each cell of a leaf-node comprises a key value and the desired aggregate values; and
wherein each node also contains a special ALL cell, the ALL cell corresponding to all the keys of the node;
wherein ALL cells of nodes in the last rollup representation contain aggregate values of the stored data when they belong to the last level, or a pointer to a node at the next level otherwise; and
wherein ALL cells of nodes not in the last rollup representation contain pointers to a node in the next rollup representation of the current level. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
-
26. A data storage device comprising:
-
(A) a Dwarf data structure;
(B) means for sampling the input data to estimate the cardinality of each dimension and ordering the dimensions according to decreasing cardinalities;
(C) means for calculating group-bys that aggregate values across at least one hierarchy level by merging previously calculated aggregate values;
(D) means for organizing key values of cells within a node in a sorted list, which becomes a B+-tree if the number of keys exceeds 2 disk pages;
(E) means for identifying different sets of group-bys that can be calculated from the same input data and which, therefore, contain the same aggregate values;
(F) Means for coalescing the store of different sets of group-bys that are identified in (E), thus eliminating their suffix redundancy. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
-
-
44. A method for retrieving the aggregate values that correspond to any specified group-by of stored data, which comprises querying a data storage device comprising said data, said data storage device comprising:
-
(A) a Dwarf data structure;
(B) means for sampling the input data to estimate the cardinality of each dimension and ordering the dimensions according to decreasing cardinalities;
(C) means for calculating group-bys that aggregate values across at least one hierarchy level by merging previously calculated aggregate values;
(D) means for organizing key values of cells within a node in a sorted list, which becomes a B+-tree if the number of keys exceeds 2 disk pages;
(E) means for identifying different sets of group-bys that can be calculated from the same input data and which, therefore, contain the same aggregate values;
(F) means for coalescing the store of different sets of group-bys that are identified in (E), thus eliminating their suffix redundancy.
-
-
45. A method for updating a Dwarf datacube structure comprised of the steps:
-
(A) creating a Delta-Dwarf for the update data, by;
(1) sampling input data to estimate the cardinality of each dimension and orders the dimensions according to decreasing cardinalities;
(2) calculating group-bys that aggregate values across at least one hierarchy level by merging previously calculated aggregate values;
(3) organizing key values of cells within a node in a sorted list, which becomes a B+-tree if the number of keys exceeds 2 disk pages and coalescing the store of organized different sets of group-bys thereby eliminating their suffix redundancy; and
(4) identifying different sets of group-bys that can be calculated from the same input data and which contain the same aggregate values;
and (B) merging the Delta-Dwarf with the old Dwarf using a merge-packing algorithm.
-
-
46. A method for incrementally updating an existing Dwarf datacube structure comprised of the steps:
-
(1) Ordering the dimensions of the update data according to the dimension ordering of the existing Dwarf datacube;
(2) Traversing the old Dwarf structure top-down to identify nodes that need to be updated due to the existence of update tuples;
wherein a node at a level L of the Dwarf structure needs to be updated if and only if at least one update tuple contains a prefix of length L-1 that is identical to the path followed from the root of the structure to the current node, and wherein each node also contains a special ALL cell, the ALL cell corresponding to all the keys of the node;
(3) Updating any node N at the lowest level of the structure by;
(a) Identifying whether the update tuples that influence the aggregate values stored in N will require the insertion of new key values in N;
(b) Creating a new node to store the results if the process in step (a) shows that new keys need to be inserted to N, otherwise storing the results in N. (c) For each key in the resulting node, merging the aggregate values existing in N with those of the update tuples. (d) Calculating the aggregate values for the ALL cell. (5) Updating any node N at higher levels by a method comprised of;
(a) Recursively propagating the update procedure to the nodes pointed by the cells of N. (b) Merging the resulting updated dwarfs to calculate the ALL cell of the node.
-
-
47. A method for updating an existing Dwarf datacube structure into a new Dwarf datacube structure using a reconstruct algorithm comprised of the steps:
-
(A) Extracting the tuples of the old Dwarf by performing a query on the most detailed hierarchy level of each dimension, requesting all possible values;
(B) Ordering the data in the update tuples according to the dimension ordering in the existing Dwarf;
(C) Merging the tuples acquired from steps (A) and (B);
(D) Employing said merged tuples to construct a new Dwarf datacube.
-
Specification