System and method for similarity indexing and searching in high dimensional space
First Claim
1. A method for managing a plurality of data points in a multi-dimensional space, the method comprising the steps of:
- receiving a plurality of data points, wherein each data point comprises a multi-dimensional record comprising a value in at least one of the dimensions;
partitioning values of the data points in each dimension into a plurality of grids, wherein each grid is assigned a grid value or a range of grid values; and
identifying at least one data point in the plurality of data points that is similar to a target data point based on matching grid values.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for providing similarity indexing and searching in multi-dimensional databases. In one aspect, given a set of data points in a multidimensional space, the values of the data points on each dimension are partitioned into a plurality of grids, wherein each grid is assigned a grid value. Given a target data point, similarity candidates (i.e., data points that are similar to the target data point) are identified based on matching grid values. An inverted grid index comprising an index on the data points falling into each grid of each dimension is utilized to identify similarity candidates. A similarity selection process is employed to select the closest identified similarity candidates for output, which utilizes a similarity function to measure the closeness of each identified similarity candidate to the target data point. A preferred similarity function is one that considers a subset of the dimensions in which a point falls within a similar grid of the target point. In addition, a correlation effect among the grids in different dimensions may be a factor captured in the similarity function.
88 Citations
50 Claims
-
1. A method for managing a plurality of data points in a multi-dimensional space, the method comprising the steps of:
-
receiving a plurality of data points, wherein each data point comprises a multi-dimensional record comprising a value in at least one of the dimensions;
partitioning values of the data points in each dimension into a plurality of grids, wherein each grid is assigned a grid value or a range of grid values; and
identifying at least one data point in the plurality of data points that is similar to a target data point based on matching grid values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for managing a plurality of data points in a multi-dimensional space, the method comprising the steps of:
-
receiving a plurality of data points, wherein each data point comprises a multi-dimensional record comprising a value in at least one of the dimensions;
partitioning values of the data points in each dimension into a plurality of grids, wherein each grid is assigned a grid value or a range of grid values; and
identifying at least one data point in the plurality of data points that is similar to a target data point based on matching grid values. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A system for managing a database having a plurality of dimensions, the system comprising:
-
a plurality of multi-dimensional objects stored in the database, wherein each multi-dimensional object comprises a value for at least one of the dimensions, and wherein each dimension of the database is partitioned into a plurality of grids based on object values in corresponding dimensions, wherein each grid is assigned a grid value or a range of grid values; and
a similarity search routine for identifying an object in the database that is similar to a target object based on matching grids. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50)
-
Specification