Heirarchical indexing of multi-attribute data by sorting, dividing and storing subsets

US 6,381,605 B1
Filed: 05/29/1999
Issued: 04/30/2002
Est. Priority Date: 05/29/1999
Status: Expired due to Term

First Claim

Patent Images

1. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of constructing a hierarchical index from a set of multi-dimensional data, the method comprising:

(a) calculating a number of members of a set of multi-dimensional data items;

(b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;

(c) determining a variance of values in one or more dimensions of said data items;

(d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;

(e) sorting said data items in said first dimension;

(f) dividing said sorted data items in said first dimension into two or more subsets by;

(f′

) determining whether said number of members is less than double said node capacity; and

(f″

) calculating the floor of the quotient of said node capacity divided by two;

(g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and

(h) configuring a leaf node of said hierarchical index for data items in a first cluster.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for indexing and storing multi-dimensional or multi-attribute data. Data items are recursively sorted in a selected dimension (e.g., the dimension having the greatest variance) and divided until each subdivision fits into a leaf node having a specified fanout. Intermediate nodes and a root node are constructed to complete the index. Each node of the index is stored in a database as a separate object or record and may include a node identifier of the unique, an identifier of a parent and/or a sibling node and an entry for each child of the node, which may be data items or other nodes. Each record entry for a child includes an associated bounding area encompassing descendant data items. Another database table or module may store information about the index, such as the dimensionality of the data, the index fanout and an identifier of a root of the index.

Citations

27 Claims

1. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of constructing a hierarchical index from a set of multi-dimensional data, the method comprising:
- (a) calculating a number of members of a set of multi-dimensional data items;
  
  (b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
  
  (c) determining a variance of values in one or more dimensions of said data items;
  
  (d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
  
  (e) sorting said data items in said first dimension;
  
  (f) dividing said sorted data items in said first dimension into two or more subsets by;
  
  (f′
  
  ) determining whether said number of members is less than double said node capacity; and
  
  (f″
  
  ) calculating the floor of the quotient of said node capacity divided by two;
  
  (g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
  
  (h) configuring a leaf node of said hierarchical index for data items in a first cluster.

2. A computer-implemented method of constructing a hierarchical index from a set of multi-dimensional data, comprising:
- (a) calculating a number of members of a set of multi-dimensional data items;
  
  (b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
  
  (c) determining a variance of values in one or more dimensions of said data items;
  
  (d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
  
  (e) sorting said data items in said first dimension;
  
  (f) dividing said sorted data items in said first dimension into two or more subsets by;
  
  (f′
  
  ) determining whether said number of members is less than double said node capacity; and
  
  (f″
  
  ) calculating the floor of the quotient of said node capacity divided by two;
  
  (g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
  
  (h) configuring a leaf node of said hierarchical index for data items in a first cluster.
- View Dependent Claims (3, 4, 5, 6, 7, 8)
- - 3. The computer-implemented method of claim 2, wherein said identifying comprises selecting a dimension, from said one or more dimensions, having the greatest variance of values.
  - 4. The computer-implemented method of claim 2, in which said configuring comprises:
5. The computer-implemented method of claim 2, wherein the dimensions of said multi-dimensional data are inherently related.
6. The computer-implemented method of claim 2, wherein the dimensions of the multi-dimensional data are independent attributes.
7. The computer-implemented method of claim 2, further comprising identifying a query pattern for retrieving one or more data items from said set of data items, wherein said query pattern comprises a hierarchy of two or more dimensions of said multi-dimensional data items.
8. The computer-implemented method of claim 7, which said identifying comprises selecting a dimension in said hierarchy of two or more dimensions.

9. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of constructing a hierarchical index from a set of multi-dimensional data, the method comprising:
- (a) calculating a number of members of a set of multi-dimensional data items;
  
  (b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
  
  (c) determining a variance of values in one or more dimensions of said data items;
  
  (d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
  
  (e) sorting said data items in said first dimension;
  
  (f) dividing said sorted data items in said first dimension into two or more subsets by;
  
  (f′
  
  ) determining whether said number of members is greater than or equal to double said node capacity; and
  
  (f″
  
  ) calculating the value of said node capacity times (the floor of (the sum of 0.5 plus (the quotient of said number of members divided by twice said node capacity)));
  
  (g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
  
  (h) configuring a leaf node of said hierarchical index for data items in a first cluster.

10. A computer-implemented method of constructing a hierarchical index from a set of multi-dimensional data, comprising:
- (a) calculating a number of members of a set of multi-dimensional data items;
  
  (b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
  
  (c) determining a variance of values in one or more dimensions of said data items;
  
  (d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
  
  (e) sorting said data items in said first dimension;
  
  (f) dividing said sorted data items in said first dimension into two or more subsets by;
  
  (f′
  
  ) determining whether said number of members is greater than or equal to double said node capacity; and
  
  (f″
  
  ) calculating the value of said node capacity times (the floor of (the sum of 0.5 plus (the quotient of said number of members divided by twice said node capacity)));
  
  (g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
  
  (h) configuring a leaf node of said hierarchical index for data items in a first cluster.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The computer-implemented method of claim 10, wherein said identifying comprises selecting a dimension, from said one or more dimensions, having the greatest variance of values.
  - 12. The computer-implemented method of claim 10, in which said configuring comprises:
13. The computer-implemented method of claim 10, wherein the dimensions of said multi-dimensional data are inherently related.
14. The computer-implemented method of claim 10, wherein the dimensions of the multi-dimensional data are independent attributes.
15. The computer-implemented method of claim 10, further comprising identifying a query pattern for retrieving one or more data items from said set of data items, wherein said query pattern comprises a hierarchy of two or more dimensions of said multi-dimensional data items.
16. The computer-implemented method of claim 15, in which said identifying comprises selecting a dimension in said hierarchy of two or more dimensions.

17. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of indexing a set of multi-dimensional data items, the method comprising:
- sorting a set of multi-dimensional data items in a first dimension, said first dimension corresponding to a primary dimension of a first query pattern;
  
  dividing said sorted set of multi-dimensional data items in said first dimension into two or more first-order subsets of multi-dimensional data items;
  
  sorting a first first-order subset of said two or more first-order subsets of multi-dimensional data items in a second dimension, said second dimension corresponding to a secondary dimension of said first query pattern;
  
  dividing said first first-order subset of multi-dimensional data items in said second dimension into two or more second-order subsets of multi-dimensional data items; and
  
  storing the members of an N-order subset of multi-dimensional data items in a leaf node of a hierarchical index, wherein the number of data items in said N-order subset is less than or equal to a node capacity of said hierarchical index and N is greater than or equal to two.

18. A computer-implemented method of indexing a set of multi-dimensional data items, comprising:
- sorting a set of multi-dimensional data items in a first dimension, said first dimension corresponding to a primary dimension of a first query pattern;
  
  dividing said sorted set of multi-dimensional data items in said first dimension into two or more first-order subsets of multi-dimensional data items;
  
  sorting a first first-order subset of said two or more first-order subsets of multi-dimensional data items in a second dimension, said second dimension corresponding to a secondary dimension of said first query pattern;
  
  dividing said first first-order subset of multi-dimensional data items in said second dimension into two or more second-order subsets of multi-dimensional data items; and
  
  storing the members of an N-order subset of multi-dimensional data items in a leaf node of a hierarchical index, wherein the number of data items in said N-order subset is less than or equal to a node capacity of said hierarchical index and N is greater than or equal to two.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 19. The method of claim 18, wherein said first dimension is a hierarchical attribute of said data items.
  - 20. The method of claim 18, further comprising storing said leaf node in a relational database.
  - 21. The method of claim 18, wherein said sorting a set of multi-dimensional data items comprises:
22. The method of claim 18, in which said dividing said sorted set of multi-dimensional data items in said first dimension comprises:
- determining the number of data items in said set of multi-dimensional data items;
  
  calculating an intermediate value in said first dimension; and
  
  selecting a first data item corresponding to said intermediate value.
23. The method of claim 22, wherein said intermediate value is an approximate median value.
24. The method of claim 22, in which said calculating an intermediate value comprises:
- determining whether said number of data items is less than twice the capacity of a node of said hierarchical index; and
  
  calculating the floor of the quotient of said node capacity divided by two.
25. The method of claim 22, in which said calculating an intermediate value comprises:
- determining whether said number of data items is greater than or equal to twice the capacity of a node of said hierarchical index; and
  
  calculating the value of said node capacity multiplied by (the floor of (the sum of 0.5 plus (the quotient of said number of data items divided by twice said node capacity))).
26. The computer-implemented method of claim 18, further comprising:
- storing an identifier of a root node of said hierarchical index;
  
  storing a node capacity of said hierarchical index; and
  
  storing a measure of the dimensionality of said multi-dimensional data items.
27. The computer-implemented method of claim 18, wherein said storing the members of an N-order subset of multi-dimensional data items comprises:
- creating a first leaf node of said hierarchical index; and
  
  inserting an entry in said first leaf node for each said member of said N-order subset of multi-dimensional data items, wherein each said entry comprises;
  
  an identifier of said corresponding member; and
  
  a bounding area encompassing said corresponding member.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle Corporation
Inventors
Banerjee, Jayanta, Ravada, Siva, Kothuri, Ravi, Sharma, Jayant
Primary Examiner(s)
Breene, John
Assistant Examiner(s)
CHANNAVAJJALA, SRIRAMA T

Application Number

US09/322,901
Time in Patent Office

1,067 Days
Field of Search

707/1-8, 707/10, 707/100-104, 707/200-205, 709/201, 709/204, 709/205, 709/222-223
US Class Current

1/1
CPC Class Codes

G06F 16/2246   Trees, e.g. B+trees

G06F 16/2264   Multidimensional index stru...

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99937   Sorting

Y10S 707/99942   Manipulating data structure...

Y10S 707/99943   Generating database or data...

Y10S 707/99945   Object-oriented database st...

Heirarchical indexing of multi-attribute data by sorting, dividing and storing subsets

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Heirarchical indexing of multi-attribute data by sorting, dividing and storing subsets

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links