Method and apparatus for optimizing and structuring data by designing a cube forest data structure for hierarchically split cube forest template

US 6,141,655 A
Filed: 09/23/1997
Issued: 10/31/2000
Est. Priority Date: 09/23/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for structuring data having at least one key attribute for storage in a memory, comprising the steps of:

a) defining a first forest F₁ as a single node;

b) constructing a subsequent forest F_j according to the substeps of;

(i) creating a new node;

(ii) copying a previous forest F_j-1, the previous forest F_j-1 having at least one tree;

(iii) making each tree in the previous forest F_j-1 a subtree of the new node;

(iv) creating another copy of the previous forest F_j-1 ; and

(v) defining the subsequent forest F_j as a union of the previous forest F_j-1 and the tree rooted at the new node; and

c) repeating step b) i-1 times, until F_i is constructed, wherein the data structure is F_i.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The paradigmatic view of data in typical decision support applications divides the attributes (or fields) in the data records into two groups: dimensional attributes and value attributes. The dimensional attributes classify the record, while the value attributes indicate a measured quantity. The dimensional attributes can be partitioned into a set of dimensions, which are orthogonal descriptions of the record. The attributes within a dimension form hierarchies of descriptions of the record, ranging from a coarse to a description. For example, the database might consist of records of retail sales collected from individual stores and brought together into a central data warehouse. This database might have three dimensions: store location, product, and time of sale. The value attribute might be the dollar value of the sale. A dimension might contain several attributes. For example, the store location dimension might consist of country, region, state, county, and zip code. These attributes form a hierarchy because knowing the value of a fine attribute (e.g., zip code) tells you the value of a coarse attribute (e.g., country). The attributes in the time dimension might be year, month, week, day, and hour. This dimension has multiple hierarchies because months do not contain an integral number of weeks. A large class of decision support queries ask for the aggregate value of one or more value attribute, where the aggregation ranges over all records whose dimensional attributes satisfy a selection predicate. For example, a query might be to find the sum of all sales of blue polo shirts in Palm Beach during the last quarter. A data table that can be described in terms of dimensions and value attributes is often called a "data cube." The records in our retail sales example can be imagined to exist in a three dimensional cube, the dimensions being the dimensional attributes. Queries, such as the example query, can be thought of as corresponding to sums over regions of the data cube. We describe herein a file structure (i.e., the Cube Forest) for storing a data cube that ensures fast response to the queries. The algorithms included herein are: (1) algorithms to load data into a cube forest; (2) algorithms to obtain an aggregate from the cube forest in response to a query; and (3) algorithms that compute an optimal cube forest structure.

260 Citations

29 Claims

1. A method for structuring data having at least one key attribute for storage in a memory, comprising the steps of:
- a) defining a first forest F₁ as a single node;
  
  b) constructing a subsequent forest F_j according to the substeps of;
  
  (i) creating a new node;
  
  (ii) copying a previous forest F_j-1, the previous forest F_j-1 having at least one tree;
  
  (iii) making each tree in the previous forest F_j-1 a subtree of the new node;
  
  (iv) creating another copy of the previous forest F_j-1 ; and
  
  (v) defining the subsequent forest F_j as a union of the previous forest F_j-1 and the tree rooted at the new node; and
  
  c) repeating step b) i-1 times, until F_i is constructed, wherein the data structure is F_i.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1, wherein paths in F_i represent keys for identifying data records.
  - 3. The method according to claim 1, wherein the data identified by keys represented by a path in F_i is a summary, or aggregate of all of the data records with keys that match on attributes in the path, said summary can be of fixed or variable size, can include a listing of record locations in a separate file, and can be a value attribute of the records.
  - 4. The method according to claim 1, wherein each key attribute represents an orthogonal dimension.
  - 5. The method according to claim 1, wherein each key attribute represents an orthogonal dimension, and each dimension is represented by a hierarchy of attribute values, and the data is structured according to the rules for a hierarchically split cube forest.
  - 6. The method according to claim 1, wherein each key attribute represents an orthogonal dimension, and each dimension is represented by a lattice of attribute values, and the data is structured according to the rules for a hierarchically split cube forest with extensions for lattice-structured dimensions.
  - 7. The method according to claim 1, wherein any summary of the data records that are selected by specifying a subset of their key attributes can be found by searching for the summary whose key is represented by a single node in F_i.

8. An index structure for storing and indexing aggregates of value attributes over at least i key attributes (A₁, . . . , A_i) comprising a plurality of i well-ordered trees built according to the rules of a full cube forest, wherein a first tree includes one template node, ant a next tree in the order includes a root template node having branches to duplicates of each of the previous trees, a total number of the template nodes is equal to 2ⁿ -1, 2^n-1 of which are leaf nodes, and the collection of trees represents a template for a set of search structure on a data table, and an index subkey is a concatenation of attributes from a template tree root to a node.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The index structure according to claim 8, wherein paths in the cube forest represent subkeys of index keys for identifying data records.
  - 10. The index structure according to claim 8, wherein the data identified by keys represented by a path in the forest and is stored in an index is a summary, or aggregate of all of the data records with keys that match on attributes in the path, said summary capable of being of fixed size or variable size, capable of being a listing of record locations in a separate file, and capable of being a value attribute of the records.
  - 11. The index structure according to claim 8, wherein each key attribute represents an orthogonal dimension.
  - 12. The index structure according to claim 8, wherein each key attribute represents an orthogonal dimension, and each dimension is represented by a hierarchy of attribute values, and the template forest is structured according to the rules for a hierarchically split cube forest.
  - 13. The index structure according to claim 8, wherein each key attribute represents an orthogonal dimension, and each dimension is represented by a lattice of attribute values, and the template forest is structured according to the rules for a hierarchically split cube forest with extensions for lattice-structured dimensions.
  - 14. The index structure according to claim 8, wherein any summary of the data records that are selected by specifying a subset of their key attributes can be found by searching an index for the subkey which is the catenation of the key attributes that specify the query, and the index to search is represented by a single node in the template forest.

15. A data storage device comprising:
- a) a data structure that conforms to a template built according to rules of a full cube forest over key attributes (A₁, . . . , A_i), the rules including;
  
  I) defining a first forest F₁ as a single node;
  
  II) constructing a subsequent forest F_j according to the substeps of;
  
  (A) creating a new node;
  
  (B) copying a previous forest F_j-1, the previous forest F_j-1 having at least one tree;
  
  (C) making each tree in the previous forest F_j-1 a subtree of the new node;
  
  (D) creating another copy of the previous forest F_j-1 ; and
  
  (E) defining the subsequent forest F_j as a union of the previous forest F_j-1 and the tree rooted at the new node; and
  
  III) repeating step ii) i-1 times, until F_i is constructed wherein the data structure is F_i ;
  
  b) means for storing an aggregation of values at each node of the full cube forest, one aggregate value for each subkey represented by the node and which appears in the data.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The device according to claim 15, wherein paths in the cube forest represent keys for identifying data records.
  - 17. The device according to claim 15, wherein the data identified by keys represented by a path in the forest is a summary, or aggregate of all of the data records with keys that match on attributes in the path, said summary capable of being of fixed size or variable size, capable of being a listing of record locations in a separate file, and capable of being the value attributes of the records.
  - 18. The device according to claim 15, wherein each key attribute represents an orthogonal dimension.
  - 19. The device according to claim 15, wherein each key attribute represents an orthogonal dimension, and each dimension is represented by a hierarchy of attribute values, and the data is structured according to the rules for a hierarchically split cube forest.
  - 20. The device according to claim 15, wherein each key attribute represents an orthogonal dimension, and each dimension is represented by a lattice of attribute values, and the data is structured according to the rules for a hierarchically split cube forest with extensions for lattice-structured dimensions.
  - 21. The device according to claim 15, wherein any summary of the data records that are selected by specifying a subset of their key attributes can be found by searching an index for the summary whose key is represented by a single node in the template forest.

22. A method for structuring data comprising the steps of:
- a) creating a hierarchically split cube forest template for the data, the hierarchically split cube forest template having a plurality of trees, each tree containing at least one node;
  
  b) creating an index on each tree within the hierarchically split cube forest template, the creating step including performing the following substeps for each tree in the hierarchically split cube forest template;
  
  i) choosing a path from a root of the tree to be a spine of the tree, wherein the spine defines a composite index, and the index has a plurality of keys which are attributes of nodes in the spine concatenated together, whereby the spine partitions the tree, creating at least one subtree;
  
  ii) repeating step i) for each subtree until all nodes of the tree are in at least one spine.
- View Dependent Claims (23, 24)
- - 23. The method according to claim 22, further comprising the step of answering any point query by searching for a single node within the cube forest.
  - 24. The method according to claim 22, further comprising the step of retrieving a response to a query with only a single descent through the cube forest, wherein the query comprises a request for an aggregate value for a specified set of attributes.

25. A method for designing a cube forest data structure for a hierarchically split cube forest template, the hierarchically split cube forest template having a plurality of trees, each tree having at least one node, each node having at least one attribute, said method comprising the steps of:
- a) designing an index on each tree of the plurality of trees within the hierarchically split cube forest template, the designing step including performing the following substeps for each tree;
  
  i) choosing a longest root-to-leaf path in the tree to be a spine of the tree, the spine defining a composite index, the composite index having a plurality of keys which are the attributes of the nodes in the spine concatenated together;
  
  ii) partitioning the tree using the spine to create at least one subtree; and
  
  iii) repeating the steps i) through ii) for each subtree until all nodes of the tree are in at least one spine.
- View Dependent Claims (26, 27, 28, 29)
- - 26. The method according to claim 25, further comprising the steps of:
    - h) defining an i-th subkey, denoted sk_i, to be a prefix comprised of (α
      
      ₁, α
      
      ₂, . . . , α
      
      _n) for every key (α
      
      ₁, α
      
      ₂, . . . , α
      
      _n) that is inserted into a tree, given an index that instantiates a spine on attributes (A₁, A₂, . . . , A_n);
      
      i) associating a set of subtree pointers with subkey sk_i if a template node corresponding to A_i has children other than A_i+1 ;
      
      j) associating an aggregate value with a particular subkey sk_i if a node corresponding to A_i is not aggregate pruned;
      
      k) defining an effective leaf for each subkey sk=(a₁, . . . , a_i) to be a place in the index where information associated with said each subkey is stored, wherein said information includes at least a subtree pointer and an aggregate value;
      
      l) building a spine index from a B-tree; and
      
      m) placing an effective leaf for a subkey sk at a highest level in the B-tree where the subkey sk is a subkey of a separator in a node, wherein an i-th separator in a B-tree node is a key that indicates which keys can be found in the i-1-th subtree as opposed to the i-th subkey.
  - 27. The method according to claim 26, further comprising the step of;
    - n) placing the effective leaf at a predetermined separator position whose prefix is sk if there is more than one such separator.
  - 28. The method according to claim 27, wherein the predetermined separator position comprises a rightmost separator.
  - 29. The method according to claim 27, wherein the predetermined separator position comprises a leftmost separator.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Corporation (AT&T, Inc.)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Johnson, Theodore, Shasha, Dennis
Primary Examiner(s)
Fetting, Anton W.
Assistant Examiner(s)
ROBINSON, GRETA LEE

Application Number

US08/936,000
Time in Patent Office

1,134 Days
Field of Search

707/3, 707/1, 707/2, 707/100, 707/104, 707/509, 707/500, 707/503
US Class Current

1/1
CPC Class Codes

G06F 16/2246   Trees, e.g. B+trees

G06F 16/2272   Management thereof

G06F 16/283   Multi-dimensional databases...

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Method and apparatus for optimizing and structuring data by designing a cube forest data structure for hierarchically split cube forest template

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

260 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for optimizing and structuring data by designing a cube forest data structure for hierarchically split cube forest template

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

260 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links