Method and apparatus for optimizing and structuring data by designing a cube forest data structure for hierarchically split cube forest template
First Claim
1. A method for structuring data having at least one key attribute for storage in a memory, comprising the steps of:
- a) defining a first forest F1 as a single node;
b) constructing a subsequent forest Fj according to the substeps of;
(i) creating a new node;
(ii) copying a previous forest Fj-1, the previous forest Fj-1 having at least one tree;
(iii) making each tree in the previous forest Fj-1 a subtree of the new node;
(iv) creating another copy of the previous forest Fj-1 ; and
(v) defining the subsequent forest Fj as a union of the previous forest Fj-1 and the tree rooted at the new node; and
c) repeating step b) i-1 times, until Fi is constructed, wherein the data structure is Fi.
1 Assignment
0 Petitions
Accused Products
Abstract
The paradigmatic view of data in typical decision support applications divides the attributes (or fields) in the data records into two groups: dimensional attributes and value attributes. The dimensional attributes classify the record, while the value attributes indicate a measured quantity. The dimensional attributes can be partitioned into a set of dimensions, which are orthogonal descriptions of the record. The attributes within a dimension form hierarchies of descriptions of the record, ranging from a coarse to a description. For example, the database might consist of records of retail sales collected from individual stores and brought together into a central data warehouse. This database might have three dimensions: store location, product, and time of sale. The value attribute might be the dollar value of the sale. A dimension might contain several attributes. For example, the store location dimension might consist of country, region, state, county, and zip code. These attributes form a hierarchy because knowing the value of a fine attribute (e.g., zip code) tells you the value of a coarse attribute (e.g., country). The attributes in the time dimension might be year, month, week, day, and hour. This dimension has multiple hierarchies because months do not contain an integral number of weeks. A large class of decision support queries ask for the aggregate value of one or more value attribute, where the aggregation ranges over all records whose dimensional attributes satisfy a selection predicate. For example, a query might be to find the sum of all sales of blue polo shirts in Palm Beach during the last quarter. A data table that can be described in terms of dimensions and value attributes is often called a "data cube." The records in our retail sales example can be imagined to exist in a three dimensional cube, the dimensions being the dimensional attributes. Queries, such as the example query, can be thought of as corresponding to sums over regions of the data cube. We describe herein a file structure (i.e., the Cube Forest) for storing a data cube that ensures fast response to the queries. The algorithms included herein are: (1) algorithms to load data into a cube forest; (2) algorithms to obtain an aggregate from the cube forest in response to a query; and (3) algorithms that compute an optimal cube forest structure.
260 Citations
29 Claims
-
1. A method for structuring data having at least one key attribute for storage in a memory, comprising the steps of:
-
a) defining a first forest F1 as a single node; b) constructing a subsequent forest Fj according to the substeps of; (i) creating a new node; (ii) copying a previous forest Fj-1, the previous forest Fj-1 having at least one tree; (iii) making each tree in the previous forest Fj-1 a subtree of the new node; (iv) creating another copy of the previous forest Fj-1 ; and (v) defining the subsequent forest Fj as a union of the previous forest Fj-1 and the tree rooted at the new node; and c) repeating step b) i-1 times, until Fi is constructed, wherein the data structure is Fi. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
- 8. An index structure for storing and indexing aggregates of value attributes over at least i key attributes (A1, . . . , Ai) comprising a plurality of i well-ordered trees built according to the rules of a full cube forest, wherein a first tree includes one template node, ant a next tree in the order includes a root template node having branches to duplicates of each of the previous trees, a total number of the template nodes is equal to 2n -1, 2n-1 of which are leaf nodes, and the collection of trees represents a template for a set of search structure on a data table, and an index subkey is a concatenation of attributes from a template tree root to a node.
-
15. A data storage device comprising:
-
a) a data structure that conforms to a template built according to rules of a full cube forest over key attributes (A1, . . . , Ai), the rules including; I) defining a first forest F1 as a single node; II) constructing a subsequent forest Fj according to the substeps of; (A) creating a new node; (B) copying a previous forest Fj-1, the previous forest Fj-1 having at least one tree; (C) making each tree in the previous forest Fj-1 a subtree of the new node; (D) creating another copy of the previous forest Fj-1 ; and (E) defining the subsequent forest Fj as a union of the previous forest Fj-1 and the tree rooted at the new node; and III) repeating step ii) i-1 times, until Fi is constructed wherein the data structure is Fi ; b) means for storing an aggregation of values at each node of the full cube forest, one aggregate value for each subkey represented by the node and which appears in the data. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A method for structuring data comprising the steps of:
-
a) creating a hierarchically split cube forest template for the data, the hierarchically split cube forest template having a plurality of trees, each tree containing at least one node; b) creating an index on each tree within the hierarchically split cube forest template, the creating step including performing the following substeps for each tree in the hierarchically split cube forest template; i) choosing a path from a root of the tree to be a spine of the tree, wherein the spine defines a composite index, and the index has a plurality of keys which are attributes of nodes in the spine concatenated together, whereby the spine partitions the tree, creating at least one subtree; ii) repeating step i) for each subtree until all nodes of the tree are in at least one spine. - View Dependent Claims (23, 24)
-
-
25. A method for designing a cube forest data structure for a hierarchically split cube forest template, the hierarchically split cube forest template having a plurality of trees, each tree having at least one node, each node having at least one attribute, said method comprising the steps of:
a) designing an index on each tree of the plurality of trees within the hierarchically split cube forest template, the designing step including performing the following substeps for each tree; i) choosing a longest root-to-leaf path in the tree to be a spine of the tree, the spine defining a composite index, the composite index having a plurality of keys which are the attributes of the nodes in the spine concatenated together; ii) partitioning the tree using the spine to create at least one subtree; and iii) repeating the steps i) through ii) for each subtree until all nodes of the tree are in at least one spine. - View Dependent Claims (26, 27, 28, 29)
Specification