Method of storing and retrieving multi-dimensional data using the hilbert curve
First Claim
1. A method of storing multi-dimensional data or multi-dimensional data points, in a multi-dimensional database and retrieving subsets of said multi-dimensional data according to the specification of a query comprising:
- 1 providing a computer memory which is able to store said multi-dimensional data at a series of addresses in said memory,2 providing a means of transforming by calculation of multi-dimensional points including said multi-dimensional data into one-dimensional values by mapping said multi-dimensional data to one-dimensional distance values along an approximation of a Hilbert space-filling curve that passes through all multi-dimensional points corresponding to possible data values in the domain of said multi-dimensional database,3 providing a means of partitioning said multi-dimensional data into an ordered sequence of subsets of said multi-dimensional data so that each said subset of said multi-dimensional data comprises multi-dimensional data that maps to lower said one-dimensional distance values than all succeeding said subsets of said multi-dimensional data,4 providing a means of identifying each said subset of said multi-dimensional data together with the location in said computer memory in which each said subset of multi-dimensional data is stored,5 providing a means of indexing locations in said computer memory of said subsets of said multi-dimensional data,6 providing a means of specifying a query region or query as a multi-dimensional rectangular region within said domain of said database,7 providing a calculation method utilizing bit-wise manipulation of said one-dimensional distance values along an approximation of a Hilbert space-filling curve for identifying for retrieval and searching which said subsets of said multi-dimensional data intersect with any said query, the calculation method comprising the steps of;
(a) calculating the lowest said one-dimensional value corresponding to a multi-dimensional point lying within said query region and identifying from said index which subset of said multi-dimensional data will contain said multi-dimensional point, if it exists as a multi-dimensional data point in said database, and identifying from said index the location in said computer memory of said subset of said multi-dimensional data,(b) calculating the lowest one-dimensional value, if it exists, corresponding to a multi-dimensional point lying within said query region which is greater than or equal to the lowest one-dimensional value corresponding to any multi-dimensional point that may exist as a multi-dimensional data point and be contained within the succeeding subset of said multi-dimensional data identified in the previous step,(c) repeating the previous step zero or more times to identify further subsets of said multi-dimensional data that intersect with said query region,whereby said database and its index adapt efficiently to insertion and deletion of said multi-dimensional data, andwhereby subsets of multi-dimensional data that intersect with a query region can be identified efficiently for searching and retrieval of multi-dimensional data that lies within the query region.
0 Assignments
0 Petitions
Accused Products
Abstract
A method of partitioning and indexing multi-dimensional data that maps data to one-dimensional values according to the sequence in which an approximation of a Hilbert space-filling curve passes through all of the points corresponding to potential multi-dimensional data in a data space. Data is partitioned into ordered pages, each corresponding to a length of Hilbert curve and identified by the sequence of the first point on it. Practical application of the indexing method is made viable and useful by the provision of an efficient querying algorithm enabling data lying within any given hyper-rectangular query region to be retrieved. This is achieved by successively calculating ever higher sequence numbers of potential data points lying within a query region and each being the lowest such value for a point lying on a different curve section. Thus a single calculation is performed to identify each page potentially containing data of interest.
57 Citations
4 Claims
-
1. A method of storing multi-dimensional data or multi-dimensional data points, in a multi-dimensional database and retrieving subsets of said multi-dimensional data according to the specification of a query comprising:
-
1 providing a computer memory which is able to store said multi-dimensional data at a series of addresses in said memory, 2 providing a means of transforming by calculation of multi-dimensional points including said multi-dimensional data into one-dimensional values by mapping said multi-dimensional data to one-dimensional distance values along an approximation of a Hilbert space-filling curve that passes through all multi-dimensional points corresponding to possible data values in the domain of said multi-dimensional database, 3 providing a means of partitioning said multi-dimensional data into an ordered sequence of subsets of said multi-dimensional data so that each said subset of said multi-dimensional data comprises multi-dimensional data that maps to lower said one-dimensional distance values than all succeeding said subsets of said multi-dimensional data, 4 providing a means of identifying each said subset of said multi-dimensional data together with the location in said computer memory in which each said subset of multi-dimensional data is stored, 5 providing a means of indexing locations in said computer memory of said subsets of said multi-dimensional data, 6 providing a means of specifying a query region or query as a multi-dimensional rectangular region within said domain of said database, 7 providing a calculation method utilizing bit-wise manipulation of said one-dimensional distance values along an approximation of a Hilbert space-filling curve for identifying for retrieval and searching which said subsets of said multi-dimensional data intersect with any said query, the calculation method comprising the steps of; (a) calculating the lowest said one-dimensional value corresponding to a multi-dimensional point lying within said query region and identifying from said index which subset of said multi-dimensional data will contain said multi-dimensional point, if it exists as a multi-dimensional data point in said database, and identifying from said index the location in said computer memory of said subset of said multi-dimensional data, (b) calculating the lowest one-dimensional value, if it exists, corresponding to a multi-dimensional point lying within said query region which is greater than or equal to the lowest one-dimensional value corresponding to any multi-dimensional point that may exist as a multi-dimensional data point and be contained within the succeeding subset of said multi-dimensional data identified in the previous step, (c) repeating the previous step zero or more times to identify further subsets of said multi-dimensional data that intersect with said query region, whereby said database and its index adapt efficiently to insertion and deletion of said multi-dimensional data, and whereby subsets of multi-dimensional data that intersect with a query region can be identified efficiently for searching and retrieval of multi-dimensional data that lies within the query region. - View Dependent Claims (2)
-
-
3. A method of storing and querying multi-dimensional data comprising the steps of,
1 transforming multi-dimensional data points into one-dimensional values, using a bijective mapping between multi-dimensional points and one-dimensional values determined by the sequence in which an approximation of a Hilbert space-filling curve passes through the multi-dimensional points in the domain of a multi-dimensional database, which are stored in a one-dimensional index structure, comprising the steps of, (a) splitting data into a plurality of subsets, each corresponding to a length of the Hubert space-filling curve, (b) computing an identifier for each subset selected from the group consisting of the one-dimensional distance value of the start of a section of Hilbert space-filling curve corresponding to the subset of data from the beginning of the Hilbert space-filling curve and the one-dimensional distance value of the end of a section of Hilbert space-filling curve corresponding to the subset of data from the beginning of the Hubert space-filling curve and a pair of distance values on the section of Hilbert space-filling curve corresponding to the subset of data, wherein one is the lowest and the other is the highest one-dimensional value corresponding to actual data points stored in the database and lying on the section of Hilbert space-filling curve, (c) storing each said subset in said index using the subset identifier as an index key, 2 using the index in calculating which subsets of said multi-dimensional intersect with the query utilizing bit-wise manipulation of said one-dimensional values, 3 using the index to retrieve candidate subsets of data, 4 searching candidate subsets of data and outputting data points which lie within the query region.
-
4. A method of storing multi-dimensional data on computer memory or computer storage media and retrieving subsets of said multi-dimensional data according to the specification of a query, comprising the steps of,
1 transforming multi-dimensional data points into one-dimensional values, using a bijective mapping between multi-dimensional points and one-dimensional values determined by the sequence in which an approximation of a Hilbert space-filling curve passes through the multi-dimensional points in the domain of a multi-dimensional database, 2 splitting data into a plurality of subsets, each corresponding to a length of the Hilbert space-filling curve, 3 computing an identifier for each said subset selected from the group consisting of the one-dimensional distance value of the start of a section of Hilbert space-filling curve corresponding to the subset of data from the beginning of the Hilbert space-filling curve and the one-dimensional distance value of the end of a section of Hilbert space-filling curve corresponding to the subset of data from the beginning of the Hilbert space-filling curve - and a pair of distance values on the section of Hilbert space-filling curve corresponding to the subset of data wherein one of said pair of distance values is the lowest and the other is the highest one-dimensional value corresponding to actual data points stored in the database and lying on the section of Hilbert space-filling curve,
4 storing each said subset on a page in said computer memory or computer storage media, 5 storing a reference to the location of each said subset in said computer memory or computer storage media in an index using the subset identifier as an index key, 6 determining at least one of the pages that intersect with a query by using identifiers in said index to calculate in numerical order, utilizing a process of bit-wise manipulation, one or more members of a set of one-dimensional distance values of points lying within the query region where each value, if it corresponds to a data point, lies on a different page, wherein the first value calculated is the distance value of a point within the query region that is closest to one end of the Hilbert space-filling curve and subsequently calculated values differ minimally from the previously calculated value, 7 using the index to locate and retrieve pages identified in the previous step, 8 searching retrieved pages and outputting data points which lie within the query region.
- and a pair of distance values on the section of Hilbert space-filling curve corresponding to the subset of data wherein one of said pair of distance values is the lowest and the other is the highest one-dimensional value corresponding to actual data points stored in the database and lying on the section of Hilbert space-filling curve,
Specification