Heirarchical indexing of multi-attribute data by sorting, dividing and storing subsets
First Claim
1. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of constructing a hierarchical index from a set of multi-dimensional data, the method comprising:
- (a) calculating a number of members of a set of multi-dimensional data items;
(b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
(c) determining a variance of values in one or more dimensions of said data items;
(d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
(e) sorting said data items in said first dimension;
(f) dividing said sorted data items in said first dimension into two or more subsets by;
(f′
) determining whether said number of members is less than double said node capacity; and
(f″
) calculating the floor of the quotient of said node capacity divided by two;
(g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
(h) configuring a leaf node of said hierarchical index for data items in a first cluster.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for indexing and storing multi-dimensional or multi-attribute data. Data items are recursively sorted in a selected dimension (e.g., the dimension having the greatest variance) and divided until each subdivision fits into a leaf node having a specified fanout. Intermediate nodes and a root node are constructed to complete the index. Each node of the index is stored in a database as a separate object or record and may include a node identifier of the unique, an identifier of a parent and/or a sibling node and an entry for each child of the node, which may be data items or other nodes. Each record entry for a child includes an associated bounding area encompassing descendant data items. Another database table or module may store information about the index, such as the dimensionality of the data, the index fanout and an identifier of a root of the index.
-
Citations
27 Claims
-
1. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of constructing a hierarchical index from a set of multi-dimensional data, the method comprising:
-
(a) calculating a number of members of a set of multi-dimensional data items;
(b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
(c) determining a variance of values in one or more dimensions of said data items;
(d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
(e) sorting said data items in said first dimension;
(f) dividing said sorted data items in said first dimension into two or more subsets by;
(f′
) determining whether said number of members is less than double said node capacity; and
(f″
) calculating the floor of the quotient of said node capacity divided by two;
(g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
(h) configuring a leaf node of said hierarchical index for data items in a first cluster.
-
-
2. A computer-implemented method of constructing a hierarchical index from a set of multi-dimensional data, comprising:
-
(a) calculating a number of members of a set of multi-dimensional data items;
(b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
(c) determining a variance of values in one or more dimensions of said data items;
(d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
(e) sorting said data items in said first dimension;
(f) dividing said sorted data items in said first dimension into two or more subsets by;
(f′
) determining whether said number of members is less than double said node capacity; and
(f″
) calculating the floor of the quotient of said node capacity divided by two;
(g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
(h) configuring a leaf node of said hierarchical index for data items in a first cluster. - View Dependent Claims (3, 4, 5, 6, 7, 8)
creating a first leaf node of the index; and
inserting an entry in said first leaf node for each data item in a first subset of said set of multi-dimensional data items, wherein a first entry for a first data item comprises;
an identifier of said first data item; and
a bounding area encompassing said first data item.
-
-
5. The computer-implemented method of claim 2, wherein the dimensions of said multi-dimensional data are inherently related.
-
6. The computer-implemented method of claim 2, wherein the dimensions of the multi-dimensional data are independent attributes.
-
7. The computer-implemented method of claim 2, further comprising identifying a query pattern for retrieving one or more data items from said set of data items, wherein said query pattern comprises a hierarchy of two or more dimensions of said multi-dimensional data items.
-
8. The computer-implemented method of claim 7, which said identifying comprises selecting a dimension in said hierarchy of two or more dimensions.
-
9. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of constructing a hierarchical index from a set of multi-dimensional data, the method comprising:
-
(a) calculating a number of members of a set of multi-dimensional data items;
(b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
(c) determining a variance of values in one or more dimensions of said data items;
(d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
(e) sorting said data items in said first dimension;
(f) dividing said sorted data items in said first dimension into two or more subsets by;
(f′
) determining whether said number of members is greater than or equal to double said node capacity; and
(f″
) calculating the value of said node capacity times (the floor of (the sum of 0.5 plus (the quotient of said number of members divided by twice said node capacity)));
(g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
(h) configuring a leaf node of said hierarchical index for data items in a first cluster.
-
-
10. A computer-implemented method of constructing a hierarchical index from a set of multi-dimensional data, comprising:
-
(a) calculating a number of members of a set of multi-dimensional data items;
(b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
(c) determining a variance of values in one or more dimensions of said data items;
(d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
(e) sorting said data items in said first dimension;
(f) dividing said sorted data items in said first dimension into two or more subsets by;
(f′
) determining whether said number of members is greater than or equal to double said node capacity; and
(f″
) calculating the value of said node capacity times (the floor of (the sum of 0.5 plus (the quotient of said number of members divided by twice said node capacity)));
(g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
(h) configuring a leaf node of said hierarchical index for data items in a first cluster. - View Dependent Claims (11, 12, 13, 14, 15, 16)
creating a first leaf node of the index; and
inserting an entry in said first leaf node for each data item in a first subset of said set of multi-dimensional data items, wherein a first entry for a first data item comprises;
an identifier of said first data item; and
a bounding area encompassing said first data item.
-
-
13. The computer-implemented method of claim 10, wherein the dimensions of said multi-dimensional data are inherently related.
-
14. The computer-implemented method of claim 10, wherein the dimensions of the multi-dimensional data are independent attributes.
-
15. The computer-implemented method of claim 10, further comprising identifying a query pattern for retrieving one or more data items from said set of data items, wherein said query pattern comprises a hierarchy of two or more dimensions of said multi-dimensional data items.
-
16. The computer-implemented method of claim 15, in which said identifying comprises selecting a dimension in said hierarchy of two or more dimensions.
-
17. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of indexing a set of multi-dimensional data items, the method comprising:
-
sorting a set of multi-dimensional data items in a first dimension, said first dimension corresponding to a primary dimension of a first query pattern;
dividing said sorted set of multi-dimensional data items in said first dimension into two or more first-order subsets of multi-dimensional data items;
sorting a first first-order subset of said two or more first-order subsets of multi-dimensional data items in a second dimension, said second dimension corresponding to a secondary dimension of said first query pattern;
dividing said first first-order subset of multi-dimensional data items in said second dimension into two or more second-order subsets of multi-dimensional data items; and
storing the members of an N-order subset of multi-dimensional data items in a leaf node of a hierarchical index, wherein the number of data items in said N-order subset is less than or equal to a node capacity of said hierarchical index and N is greater than or equal to two.
-
-
18. A computer-implemented method of indexing a set of multi-dimensional data items, comprising:
-
sorting a set of multi-dimensional data items in a first dimension, said first dimension corresponding to a primary dimension of a first query pattern;
dividing said sorted set of multi-dimensional data items in said first dimension into two or more first-order subsets of multi-dimensional data items;
sorting a first first-order subset of said two or more first-order subsets of multi-dimensional data items in a second dimension, said second dimension corresponding to a secondary dimension of said first query pattern;
dividing said first first-order subset of multi-dimensional data items in said second dimension into two or more second-order subsets of multi-dimensional data items; and
storing the members of an N-order subset of multi-dimensional data items in a leaf node of a hierarchical index, wherein the number of data items in said N-order subset is less than or equal to a node capacity of said hierarchical index and N is greater than or equal to two. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27)
determining a variance of said multi-dimensional data items in one or more dimensions of said multi-dimensional data items; and
selecting a first dimension of said multi-dimensional data items having a greater variance than a second dimension of said multi-dimensional data items.
-
-
22. The method of claim 18, in which said dividing said sorted set of multi-dimensional data items in said first dimension comprises:
-
determining the number of data items in said set of multi-dimensional data items;
calculating an intermediate value in said first dimension; and
selecting a first data item corresponding to said intermediate value.
-
-
23. The method of claim 22, wherein said intermediate value is an approximate median value.
-
24. The method of claim 22, in which said calculating an intermediate value comprises:
-
determining whether said number of data items is less than twice the capacity of a node of said hierarchical index; and
calculating the floor of the quotient of said node capacity divided by two.
-
-
25. The method of claim 22, in which said calculating an intermediate value comprises:
-
determining whether said number of data items is greater than or equal to twice the capacity of a node of said hierarchical index; and
calculating the value of said node capacity multiplied by (the floor of (the sum of 0.5 plus (the quotient of said number of data items divided by twice said node capacity))).
-
-
26. The computer-implemented method of claim 18, further comprising:
-
storing an identifier of a root node of said hierarchical index;
storing a node capacity of said hierarchical index; and
storing a measure of the dimensionality of said multi-dimensional data items.
-
-
27. The computer-implemented method of claim 18, wherein said storing the members of an N-order subset of multi-dimensional data items comprises:
-
creating a first leaf node of said hierarchical index; and
inserting an entry in said first leaf node for each said member of said N-order subset of multi-dimensional data items, wherein each said entry comprises;
an identifier of said corresponding member; and
a bounding area encompassing said corresponding member.
-
Specification