Buffering a hierarchical index of multi-dimensional data
First Claim
1. A computer-implemented method of buffering a hierarchical index of multi-dimensional data, wherein said index comprises a set of nodes having associated multi-dimensional data regions, the method comprising:
- storing a set of multi-dimensional data items on a computer system;
maintaining a hierarchical index of said multi-dimensional data items;
maintaining a buffer configured to store one or more nodes of the set of nodes comprising the index;
storing in said buffer a first node in the set of nodes, wherein said first node has the largest associated data region; and
repeating said storing for a next node in the set of nodes having a next largest associated data region than the node just stored in said buffer until one of;
every node in the set of nodes has been stored in said buffer; and
said node having a next largest associated data region is too large to fit in said buffer.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods are provided for buffering nodes of a hierarchical index (e.g., R-tree, bang file, hB-tree) during operations on multi-dimensional data represented by the index. The methods are particularly suited for query operations, and a different method may be more suitable for one pattern of queries than another. Where queries are distributed in a relatively uniform manner across the domain or dataspace of an index, a node-area buffering method is provided. In this method nodes are cached or buffered in order of their respective areas (e.g., their minimum bounding areas), and a node having a smaller area will be replaced in cache before a node having a larger area. When, however, queries are not uniformly distributed, then a least frequently accessed buffering technique may be applied. According to this method statistics are maintained concerning the frequency with which individual index nodes are accessed. Those accessed less frequently are replaced in cache before those accessed more frequently. Yet another, generic, buffering strategy is provided that is suitable for all patterns of query distribution. In accordance with this method, whenever a node must be removed from cache in order to make room for a newly accessed node, cached nodes are compared to each other to determine which provides the least caching benefit and may therefore be ejected. A comparison may involve three factors—the difference in the nodes'"'"' areas, the difference in the frequency with which they have been accessed and the difference between their latest access times. These factors may be weighted to give them more or less effect in relation to each other.
441 Citations
28 Claims
-
1. A computer-implemented method of buffering a hierarchical index of multi-dimensional data, wherein said index comprises a set of nodes having associated multi-dimensional data regions, the method comprising:
-
storing a set of multi-dimensional data items on a computer system;
maintaining a hierarchical index of said multi-dimensional data items;
maintaining a buffer configured to store one or more nodes of the set of nodes comprising the index;
storing in said buffer a first node in the set of nodes, wherein said first node has the largest associated data region; and
repeating said storing for a next node in the set of nodes having a next largest associated data region than the node just stored in said buffer until one of;
every node in the set of nodes has been stored in said buffer; and
said node having a next largest associated data region is too large to fit in said buffer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
a first object configured to store a record for each node in the set of nodes; and
a second object configured to store information concerning said hierarchical index.
-
-
3. The method of claim 2, wherein said operation is a query on said database.
-
4. The method of claim 2, wherein a first record in said first object corresponds to said first node of said index and comprises one or more of:
-
a unique identifier of said first node;
an identifier of a parent node of said first node;
an identifier of a sibling node of said first node;
a measure of a number of child items of said first node, wherein each of said child items is either a data item or a child node; and
for each of said child items of said first node, a child entry comprising;
an identifier of a child item; and
a bounding area encompassing said child item if said child item is a data item or a bounding area encompassing all data items descending from said child item if said child item is a child node;
wherein said multi-dimensional data region associated with said first node is said bounding area.
-
-
5. The method of claim 4, in which said storing in said buffer comprises copying one or more of said components of said first record into a buffer area in a computer memory.
-
6. The method of claim 2, in which said second object comprises one or more of:
-
an identifier of a root node of said hierarchical index;
a fanout of said index; and
a measure of the dimensionality of said multi-dimensional data items.
-
-
7. The method of claim 1, wherein said buffer is a memory buffer for buffering portions of said hierarchical index stored on a storage device having a slower access speed than said memory buffer.
-
8. The method of claim 1, wherein said storing in said buffer comprises:
-
storing in a buffer an identifier of a child item of said first node, wherein said child item is one of a data item and a child node; and
storing in said buffer a bounding area for said child item;
wherein said bounding area is said associated multi-dimensional data region of said child item if said child item is a child node; and
wherein said bounding area is a boundary of a region encompassing said child item if said child item is a data item.
-
-
9. The method of claim 1, further comprising:
replacing a second node in said buffer with a third node, wherein said data region associated with said third node is larger than said data region associated with said second node.
-
10. The method of claim 1, further comprising:
-
determining a frequency of access of a first subset of said index nodes during a series of operations involving said set of data items; and
replacing nodes stored in said buffer with said first subset of said index nodes in order of said frequency of access.
-
-
11. A computer-implemented method of adaptively buffering a hierarchical index of multi-dimensional data, the method comprising:
-
maintaining a hierarchical index of multi-dimensional data items, wherein said index comprises a set of nodes;
maintaining a buffer configured to store a subset of said set of nodes;
determining a frequency with which each node in said set of nodes is accessed during a first set of index operations; and
storing in said buffer a first subset of said set of nodes, said first subset comprising the nodes accessed most frequently during said first set of index operations. - View Dependent Claims (12, 13, 14, 15)
replacing said first subset of nodes in said buffer with a second subset of said set of nodes comprising the nodes most frequently accessed during a second set of index operations.
-
-
13. The method of claim 12, wherein said first set of index operations comprises said second set of index operations.
-
14. The method of claim 11, in which said maintaining a hierarchical index comprises operating a computer database that includes:
-
a collection of records, wherein each of said records corresponds to a node of said hierarchical index; and
a description of said index.
-
-
15. The method of claim 14, wherein a first record in said collection of records corresponds to a first node in said first subset of nodes and comprises one or more of:
-
a unique identifier of said first node;
an identifier of a parent node of said first node;
an identifier of a sibling node of said first node;
a measure of a number of child items of said first node, wherein each of said child items is either a data item or a child node; and
for each of said child items of said first node, a child entry comprising;
an identifier of a child item; and
a bounding area encompassing said child item if said child item is a data item or a bounding area encompassing all data items descending from said child item if said child item is a child node.
-
-
16. A method of caching nodes of a hierarchical index during operations performed on a plurality of multi-dimensional data items represented by the index, wherein the index comprises a set of nodes having associated bounding regions, the method comprising:
-
maintaining in a cache a first subset of the set of nodes;
receiving a request for an operation involving the plurality of data items;
accessing an uncached node not included in said first subset of nodes;
determining a difference in size between the bounding region of a first node in said first subset of nodes and the bounding region of a second node in said first subset of nodes;
determining which of said first node and said second node provides less caching benefit, wherein greater caching benefit is provided by a node having a larger bounding region than a node having a smaller bounding region; and
storing said uncached node in said cache in place of said node providing less caching benefit. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
subtracting said bounding region of said second node from said bounding region of said first node to produce a size difference; and
dividing said size difference by said bounding region of said second node.
-
-
18. The method of claim 16, further comprising:
-
determining a difference in recency of access of said first node and said second node; and
determining a difference in frequency of access of said first node and said second node.
-
-
19. The method of claim 18, wherein said determining a difference in recency of access comprises:
-
subtracting a time at which said second node was most recently accessed from a time at which said first node was most recently accessed to produce a time difference; and
dividing said time difference by a size of said cache.
-
-
20. The method of claim 18, wherein one of said difference in size, said difference in recency of access and said difference in frequency of access is assigned a weighting factor.
-
21. The method of claim 18, wherein said determining which of said first node and said second node provides less caching benefit comprises:
-
aggregating said difference in size, said difference in recency of access and said difference in frequency of access to generate a comparison value; and
determining which of said first node and said second node provides less caching benefit by comparing said comparison value to a threshold value.
-
-
22. The method of claim 21, wherein if said comparison value is greater than said threshold value, then said second node is determined to provide less caching benefit than said first node.
-
23. The method of claim 22, wherein said threshold value is zero.
-
24. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of buffering a hierarchical index of multi-dimensional data, wherein said index comprises a set of nodes having associated multi-dimensional data regions, the method comprising:
-
maintaining a set of multi-dimensional data items on a computer system;
maintaining a hierarchical index of said multi-dimensional data items;
maintaining a buffer configured to store one or more nodes of said set of nodes;
storing in said buffer a first node of said set of nodes having a largest associated data region;
repeating said storing for a node in said set of nodes having a next largest associated data region than the node just stored in said buffer until one of;
every node in said set of nodes has been stored in said buffer; and
said node having a next largest associated data region is too large to fit in said buffer;
receiving a request for an operation on said data items; and
accessing one or more of said buffered nodes during execution of said operation.
-
-
25. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of caching nodes of a hierarchical index during operations performed on a plurality of multi-dimensional data items represented by the index, wherein the index comprises a set of nodes having associated bounding regions, the method comprising:
-
maintaining in a cache a first subset of the set of nodes;
receiving a request for an operation involving the plurality of data items;
accessing an uncached node not included in said first subset of nodes;
determining a difference in size between the bounding region of a first node in said first subset of nodes and the bounding region of a second node in said first subset of nodes;
determining which of said first node and said second node provides less caching benefit, wherein greater caching benefit is provided by a node having a larger bounding region than a node having a smaller bounding region; and
storing said uncached node in said cache in place of said node providing less caching benefit.
-
-
26. An apparatus for managing a set of multi-dimensional data items, comprising:
-
a storage device configured to store a set of multi-dimensional data items;
a processor configured to manipulate said set of multi-dimensional data items;
a hierarchical index of said set of multi-dimensional data items, wherein said index comprises a set of nodes; and
a buffer configured to store a subset of said set of nodes to facilitate access to said nodes during an operation on said set of multi-dimensional data items;
wherein said subset of nodes is stored in said buffer in accordance with predetermined criteria including one or more of;
the size of each node'"'"'s bounding region, the frequency with which each node is accessed during a first series of operations on the set of data items, and the recency with which each node has been accessed.- View Dependent Claims (27)
a first module configured to store one or more of;
an identifier of a root node of said index;
a fanout of said index; and
a dimensionality of said multi-dimensional data items; and
a second module comprising a record for each node in said set of nodes, wherein a first record for a first node comprises;
an identifier of said first node; and
an entry for each child of said first node, wherein a child is one of a data item and a child node and wherein a first entry comprises;
an identifier of a first child of said first node; and
a bounding region encompassing either said first child if said first child is a data item or a bounding region encompassing all data items of descendants of said first child if said first child is a child node.
-
-
28. A computer readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method of adaptively buffering a hierarchical index of multi-dimensional data, the method comprising:
-
maintaining a hierarchical index of multi-dimensional data items, wherein said index comprises a set of nodes;
maintaining a buffer configured to store a subset of said set of nodes;
determining a frequency with which each node in said set of nodes is accessed during a first set of index operations; and
storing in said buffer a first subset of said set of nodes, said first subset comprising the nodes accessed most frequently during said first set of index operations.
-
Specification