Relational database system for storing nodes of a hierarchical index of multi-dimensional data in a first module and metadata regarding the index in a second module
First Claim
1. A method of storing a hierarchical index of multi-dimensional data in a database on a computer system, comprising:
- constructing a first module in a first relational database;
inserting a row in said first module for a hierarchical index of multi-dimensional data, wherein said row comprises;
an identifier of a root node of said index;
a node capacity of said index; and
a measure of the dimensionality of said multi-dimensional data;
constructing a second module in a second relational database;
inserting a row in said second module for a first node of said index, wherein said row comprises one or more of;
a first identifier of said first node;
a location identifier configured to identify a storage location of said first node;
a parent_node identifier of a parent node of said first node;
a parent_location identifier configured to identify a storage location of said parent node;
a sibling identifier of a sibling node of said first node;
one or more entries, wherein each entry comprises;
an identifier of a child of said first node, wherein said child is either a child data item or a child node; and
a bounding area encompassing either said child data item or a set of data items accessible through said child node; and
a count of the number of said one or more entries; and
storing the multi-dimensional data in a third module in a third relational database.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for indexing and storing multi-dimensional or multi-attribute data. Data items are recursively sorted in a selected dimension (e.g., the dimension having the greatest variance) and divided until each subdivision fits into a leaf node having a specified fanout. Intermediate nodes and a root node are constructed to complete the index. Each node of the index is stored in a database as a separate object or record and may include a node identifier of the unique, an identifier of a parent and/or a sibling node and an entry for each child of the node, which may be data items or other nodes. Each record entry for a child includes an associated bounding area encompassing descendant data items. Another database table or module may store information about the index, such as the dimensionality of the data, the index fanout and an identifier of a root of the index.
456 Citations
28 Claims
-
1. A method of storing a hierarchical index of multi-dimensional data in a database on a computer system, comprising:
-
constructing a first module in a first relational database;
inserting a row in said first module for a hierarchical index of multi-dimensional data, wherein said row comprises;
an identifier of a root node of said index;
a node capacity of said index; and
a measure of the dimensionality of said multi-dimensional data;
constructing a second module in a second relational database;
inserting a row in said second module for a first node of said index, wherein said row comprises one or more of;
a first identifier of said first node;
a location identifier configured to identify a storage location of said first node;
a parent_node identifier of a parent node of said first node;
a parent_location identifier configured to identify a storage location of said parent node;
a sibling identifier of a sibling node of said first node;
one or more entries, wherein each entry comprises;
an identifier of a child of said first node, wherein said child is either a child data item or a child node; and
a bounding area encompassing either said child data item or a set of data items accessible through said child node; and
a count of the number of said one or more entries; and
storing the multi-dimensional data in a third module in a third relational database.- View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method of constructing a hierarchical index from a set of multi-dimensional data, comprising:
-
(a) calculating a number of members of a set of multi-dimensional data items;
(b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
(c) determining a variance of values in one or more dimensions of said data items;
(d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
(e) sorting said data items in said first dimension;
(f) dividing said sorted data items in said first dimension into two or more subsets;
(g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
(h) configuring a leaf node of said hierarchical index for data items in a first luster. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
creating a first leaf node of the index; and
inserting an entry in said first leaf node for each data item in a first subset of said set of multi-dimensional data items, wherein a first entry for a first data item comprises;
an identifier of said first data item; and
a bounding area encompassing said first data item.
-
-
10. The method of claim 5, wherein the dimensions of said multi-dimensional data are inherently related.
-
11. The method of claim 5, wherein the dimensions of the multi-dimensional data are independent attributes.
-
12. The method of claim 5, further comprising identifying a query pattern for retrieving one or more data items from said set of data items, wherein said query pattern comprises a hierachy of two or more dimensions of said multi-dimensional data items.
-
13. The method of claim 12, in which said identifying comprises selecting a dimension in said hierarchy of two or more dimensions.
-
14. A method of updating an electronic index of multi-dimensional data items, comprising:
-
receiving a new data item to be added to a set of hierarchically indexed multi-dimensional data items;
inserting said new data item in a leaf node of said hierarchical index, wherein each node of said index comprises one or more of;
a node identifier of said node;
a row identifier of a storage location of said node in a database;
a parent_node identifier of a parent of said node;
a parent_row identifier of a storage location of said parent of said node in said database;
a sibling identifier of a sibling of said node; and
a set of child entries corresponding to children of said node;
determining whether a first node in said index splits due to said inserting;
if said first node splits due to said inserting;
creating a second node;
assigning said second node said node identifier of said first node;
assigning a new node identifier to said first node; and
transferring one or more child entries of said set of child entries of said first node to said second node;
if said first node is a root node of said index;
creating a new root node of said index;
setting said parent_node identifier of one of said first node and said second node to said node identifier of said new root node; and
setting said parent_row identifier of said one of said first node and said second node to said row identifier of said new root node; and
updating metadata regarding said index, said metadata comprising;
a dimensionality of said multi-dimensional data items;
an identifier of said root node of said index; and
a node capacity of said index. - View Dependent Claims (15, 16)
traversing one or more nodes of said index to locate a leaf node in which to insert said new data item;
storing a node identifier of each of said one or more nodes during said traversing as a first node identifier in a data structure; and
adding a child entry to said leaf node for said new data item.
-
-
16. The method of claim 15, in which said determining comprises, after said adding:
-
removing said first node identifiers of said one or more nodes from said data structure in reverse order of said storing; and
determining whether said first node identifier of a node removed from said data structure is different from a node identifier of said node at said removing.
-
-
17. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of storing a hierarchical index of multi-dimensional data in a database on a computer system, the method comprising:
-
constructing a first module in a first relational database;
inserting a row in said first module for a hierarchical index of multi-dimensional data, wherein said row comprises;
an identifier of a root node of said index;
a node capacity of said index; and
a measure of the dimensionality of said multi-dimensional data;
constructing a second module in a second relational database;
inserting a row in said second module for a first node of said index, wherein said row comprises one or more of;
a first identifier of said first node;
a location identifier configured to identify a storage location of said first node;
a parent_node identifier of a parent node of said first node;
a parent_location identifier configured to identify a storage location of said parent node;
a sibling identifier of a sibling node of said first node;
one or more entries, wherein each entry comprises;
an identifier of a child of said first node, wherein said child is either a child data item or a child node; and
a bounding area encompassing either said child data item or a set of data items accessible through said child node; and
a count of the number of said one or more entries; and
storing the multi-dimensional data in a third module in a third relational database.
-
-
18. A computer readable storage medium containing a data structure configured for hierarchically indexing multi-dimensional data items, said data structure comprising:
-
a first module configured to store a set of records, wherein each record consists of one or more of;
an identifier of a node of a hierarchical index of multi-dimensional data items;
an identifier of said record;
an identifier of a parent node of said node;
an identifier of a sibling node of said node; and
one or more child entries, each said child entry comprising;
an identifier of a child item of said node; and
a bounding area encompassing one or more of said multi-dimensional data items; and
a second module configured to store one or more of;
a dimensionality of said multi-dimensional data items;
an identifier of a root node of said index; and
a measure of a node capacity of said index. - View Dependent Claims (19, 20, 21, 22, 23)
a child item of a first child entry comprises one of: a data item; and
a child node; and
said bounding area of said first child entry encompasses one of;
if said child item is a data item, said data item; and
if said child item is a child node, a subset of said multi-dimensional data items accessible through said child node.
-
-
21. The computer readable storage medium of claim 18, wherein each of said multi-dimensional data items comprises a geographical position.
-
22. The computer readable storage medium of claim 21, wherein said bounding area comprises a geographical region surrounding said geographic positions corresponding to said one or more multi-dimensional data items.
-
23. The computer readable storage medium of claim 18, wherein said node capacity comprises a fanout of said index.
-
24. An apparatus for indexing a set of multi-dimensional data items, comprising:
-
a storage device configured to store a set of multi-dimensional data items;
a processor configured to manipulate said set of multi-dimensional data items; and
a database configured to store a hierarchical index of said set of multi-dimensional data items, wherein said index comprises one or more nodes, the database comprising;
a first module configured to store one or more of;
an identifier of a root node of said index;
a node capacity of said index; and
a dimensionality of said multi-dimensional data items; and
a second module comprising a record for each of said one or more nodes, a first record for a first node comprising;
an identifier of said first node; and
an entry for each child of said first node, wherein a child is one of a data item and a child node and wherein a first entry comprises;
an identifier of a first child of said first node; and
a bounding region encompassing said first child if said first child is a data item or all data items of descendants of said first child if said first child is a child node. - View Dependent Claims (25, 26)
a third module configured to store one or more of said multi-dimensional data items.
-
-
27. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of constructing a hierarchical index from a set of multi-dimensional data, the method comprising:
-
(a) calculating a number of members of a set of multi-dimensional data items;
(b) determining whether said number of members exceeds a node capacity of a hierarchical index configured to index said data items;
(c) determining a variance of values in one or more dimensions of said data items;
(d) identifying a first dimension of said one or more multi dimensions in which to divide said set of multi-dimensional data items;
(e) sorting said data items in said first dimension;
(f) dividing said sorted data items in said first dimension into two or more subsets;
(g) repeating (a)-(f) for each said subset in order to divide said set of data items into a plurality of data item clusters, wherein each cluster comprises a number of data items no greater than said node capacity; and
(h) configuring a leaf node of said hierarchical index for data items in a first cluster.
-
-
28. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of updating an electronic index of multi-dimensional data items, the method comprising:
-
receiving a new data item to be added to a set of hierarchically indexed multi-dimensional data items;
inserting said new data item in a leaf node of said hierarchical index, wherein each node of said index comprises one or more of;
a node identifier of said node;
a row identifier of a storage location of said node in a database;
a parent_node identifier of a parent of said node;
a parent_row identifier of a storage location of said parent of said node in said database;
a sibling identifier of a sibling of said node; and
a set of child entries corresponding to children of said node;
determining whether a first node in said index splits due to said inserting;
if said first node splits due to said inserting;
creating a second node;
assigning said second node said node identifier of said first node;
assigning a new node identifier to said first node; and
transferring one or more child entries of said set of child entries of said first node to said second node; and
if said first node is a root node of said index;
creating a new root node of said index;
setting said parent_node identifier of one of said first node and said second node to said node identifier of said new root node; and
setting said parent_row identifier of said one of said first node and said second node to said row identifier of said new root node.
-
Specification