Transformation-based method for indexing high-dimensional data for nearest neighbour queries
First Claim
1. A computerized method for indexing in a database of stored objects, the method comprising:
- applying a clustering algorithm to organize high-dimensional points into partitions;
selecting a reference point for each partition;
applying a transformation function to map a high-dimensional point to a one-dimensional space;
indexing the transformed point using a single dimensional index structure; and
wherein the transforming includes mapping a high dimensional point p to a single dimensional value y under the transformation function, y=i*c+dist(p,o), where point o is the closest reference point to p, and dist(p,o) represent the distance between p and o, c is an arbitrary constant greater than 1, i is an integer that uniquely identifies o.
1 Assignment
0 Petitions
Accused Products
Abstract
We disclose a transformation-based method for indexing high-dimensional data to support similarity search. The method, iDistance, partitions the data into clusters either based on some clustering strategies or simple data space partitioning strategies. The data in each cluster can be described based on their similarity with respect to a reference point, and hence they can be transformed into a single dimensional space based on such relative similarity. This allows us to index the data points using a B+-tree structure and perform similarity search using range search strategy. As such, the method is well suited for integration into existing DBMSs. We also study two data partitioning strategies, and several methods on selection of reference points. We conducted extensive experiments to evaluate iDistance, and our results demonstrate its effectiveness.
35 Citations
11 Claims
-
1. A computerized method for indexing in a database of stored objects, the method comprising:
-
applying a clustering algorithm to organize high-dimensional points into partitions;
selecting a reference point for each partition;
applying a transformation function to map a high-dimensional point to a one-dimensional space;
indexing the transformed point using a single dimensional index structure; and
wherein the transforming includes mapping a high dimensional point p to a single dimensional value y under the transformation function, y=i*c+dist(p,o), where point o is the closest reference point to p, and dist(p,o) represent the distance between p and o, c is an arbitrary constant greater than 1, i is an integer that uniquely identifies o. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
for equi-space partitioning, the reference point of a partition can be any point along the line formed by the centroid of a hyperplane and the centroid of the corresponding data space; and
for cluster-based partitioning, either the centroid of the clusters or any points on the edge of the hyperplanes can be used as reference points.
-
-
8. A method for indexing as defined in claim 1 wherein the transforming includes mapping a high-dimensional point to its distance from the closest reference point.
-
9. A method for KNN search as defined in claim 2 wherein the method further comprises:
-
transforming the KNN search into a sequence of increasingly larger range queries; and
evaluating each range query on the single dimensional index structure storing the transformed points.
-
-
10. A method for KNN search as defined in claim 9 wherein approximate answers can be returned to the users as soon as they are found;
- and the answers are progressively refined until all answers are obtained unless the users terminate prematurely.
-
11. A method for indexing as defined in claim 2 wherein the desired objects are nearest with respect to a distance metric to the query object.
Specification