Searching multidimensional indexes using associated clustering and dimension reduction information
First Claim
1. In a system including one or more reduced dimensionality indexes to multidimensional data, a method for performing an exact search for specified data using the one or more indexes, the method comprising the following steps in the sequence set forth:
- associating specified data to a data cluster based on clustering information, said data cluster being a partition of an original data input set;
reducing a dimensionality of the specified data, based on dimensionality reduction information for a reduced dimensionality version of the cluster;
recursively applying said associating and reducing steps unitl a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; and
searching, using low dimensional indexes to said lowest level and a reduced dimensionality specified data, for cluster elements of the reduced dimensionality version of the cluster matching the specified data.
1 Assignment
0 Petitions
Accused Products
Abstract
An improved multidimensional data indexing technique that generates compact indexes such that most or all of the index can reside in main memory at any time. During the clustering and dimensionality reduction, clustering information and dimensionality reduction information are generated for use in a subsequent search phase. The indexing technique can be effective even in the presence of variables which are not highly correlated. Other features provide for efficiently performing exact and nearest neighbor searches using the clustering information and dimensionality reduction information. One example of the dimensionality reduction uses a singular value decomposition technique. The method can also be recursively applied to each of the reduced-dimensionality clusters. The dimensionality reduction also can be applied to the entire database as a first step of the index generation.
-
Citations
51 Claims
-
1. In a system including one or more reduced dimensionality indexes to multidimensional data, a method for performing an exact search for specified data using the one or more indexes, the method comprising the following steps in the sequence set forth:
-
associating specified data to a data cluster based on clustering information, said data cluster being a partition of an original data input set; reducing a dimensionality of the specified data, based on dimensionality reduction information for a reduced dimensionality version of the cluster; recursively applying said associating and reducing steps unitl a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; and searching, using low dimensional indexes to said lowest level and a reduced dimensionality specified data, for cluster elements of the reduced dimensionality version of the cluster matching the specified data. - View Dependent Claims (3, 4, 5, 6, 8)
-
-
2. A method for performing an exact search for specified data, comprising the following steps in the sequence set forth:
-
associating specified data to a data cluster based on clustering information, said data cluster being a partition from an original data input set; reducing a dimensionality of the specified data, based on dimensionality reduction information for a reduced dimensionality version of the cluster; recursively applying said associating and reducing steps until a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; and linearly scanning, based on a reduced dimensionality specified data, for the reduced dimensionality version of the cluster matching the specified data.
-
-
7. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for performing an exact search for specified data, said method comprising:
-
associating specified data to a data cluster based on clustering information, said data cluster being a partition of an original data input set; after said associating, reducing a dimensionality of the specified data, based on dimensionality reduction information for a reduced dimensionality version of the cluster; recursively applying said associating and reducing steps until a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; and linearly scanning, based on a reduced dimensionality specified data, for the reduced dimensionality version of the cluster matching the specified data.
-
-
9. In a system including one or more reduced dimensionality indexes to multidimensional data, a method for searching for k records most similar to specified data, using the one or more indexes, the method comprising the steps of:
-
identifying the specified data with a cluster based on clustering information, said cluster being a partition from an original data input set; after said identifying step, reducing a dimensionality of the specified data, based on dimensionality reduction information for an identified cluster; recursively applying said identifying and reducing steps until a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; generating dimensionality reduction information for reduced dimensionality specified data, in response to said reducing; and retrieving the ≦
k records most similar to the specified data from the identified cluster using the one or more reduced dimensionality indexes, the dimensionality reduction information and the reduced dimensionality specified data. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A program storage device readable by a machine which includes one or more reduced dimensionality indexes to multidimensional data, the program storage device tangibly embodying a program of instructions executable by the machine to perform method steps for performing an exact search for specified data using the one or more indexes, said method steps comprising:
-
associating specified data to a data cluster based on clustering information, said data cluster being a partition of an original data input set; reducing a dimensionality of the specified data, based on dimensionality reduction information for a reduced dimensionality version of the cluster; recursively applying said associating and reducing steps until a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; and searching, using low dimensional indexes to said lowest level and a reduced dimensionality specified data, for cluster elements of the reduced dimensionality version of the cluster matching the specified data. - View Dependent Claims (22, 23, 24, 25, 26)
-
-
27. A program storage device readable by a machine which includes one or more reduced dimensionality indexes to multidimensional data, the program storage device tangibly embodying a program of instructions executable by the machine to perform method steps for searching for k records most similar to specified data, using the one or more indexes, said method steps comprising:
-
identifying the specified data with a data cluster based on clustering information, said data cluster being a partition of an original data input set; after said identifying step, reducing a dimensionality of the specified data, based on dimensionality reduction information for an identified cluster; recursively applying said identifying and reducing steps until a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; generating dimensionality reduction information for reduced dimensionality specified data, in response to said reducing; and retrieving the ≦
k records most similar to the specified data, from the identified cluster using the one or more reduced dimensionality indexes, the dimensionality reduction information and the reduced dimensionality specified data. - View Dependent Claims (30, 31, 32, 33, 34, 35, 38, 39, 40)
-
-
28. A computer program product comprising:
a computer usable medium having computer readable program code means embodied therein for searching for k records most similar to a search template, using one or more multidimensional indexes, the computer readable program code means in said computer program product comprising; computer readable program code cluster search means for identifying the search template with a cluster based on clustering information, said cluster being a partition of an original data input set; computer readable program code template projection means for reducing a dimensionality of the search template, after said identifying, based on dimensionality reduction information for an identified cluster; computer readable program code means for causing the computer to recursively apply said cluster search means and said template projection means until a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; computer readable program code means for generating dimensionality reduction information for a reduced dimensionality search template, in response to said reducing; and computer readable program code intra-cluster search means for retrieving the ≦
k records most similar to the specified data from the identified cluster using the one or more reduced dimensionality indexes, the dimensionality reduction information and the reduced dimensionality specified data.- View Dependent Claims (29, 36, 37, 45, 46, 47, 48, 49, 50, 51)
-
41. A program storage device readable by a machine which includes one or more reduced dimensionality indexes to multidimensional data, the program storage device tangibly embodying a program of instructions executable by the machine to perform method steps for searching for k records most similar to specified data, using the one or more indexes, said method steps comprising:
-
identifying the specified data with a cluster based on clustering information, said cluster being a partition of an original data input set; after the identifying step, reducing a dimensionality of the specified data, based on dimensionality reduction information for a reduced dimensionality version of an identified cluster; recursively applying said identifying and reducing steps until a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; generating dimensionality reduction information for reduced dimensionality specified data, in response to said reducing; and linearly scanning, based on the reduced dimensionality specified data, for the reduced dimensionality version of the cluster matching the specified data.
-
-
42. A computer program product comprising:
a computer usable medium having computer readable program code means embodied therein for performing an exact search for a search template using one or more multidimensional indexes, the computer readable program code means in said computer program product comprising; computer readable program code cluster search means for causing a computer to effect associating specified data to a data cluster based on clustering information, said data cluster being a partition of an original data input set; computer readable program code template projection means for causing a computer to effect reducing a dimensionality of the search template, after said associating, based on dimensionality reduction information for a reduced dimensionality version of the cluster; computer readable program code means for causing a computer to effect recursively applying said cluster search means and template projection means until a corresponding lowest level of a hierarchy of reduced dimensionality clusters has been reached; and computer readable program code intra-cluster search means for causing a computer to effect searching, using low dimensional indexes to said lowest level and a reduced dimensionality search template, for cluster elements of the reduced dimensionality version of the cluster matching the search template. - View Dependent Claims (43, 44)
Specification