Multidimensional data clustering and dimension reduction for indexing and searching
First Claim
1. A computerized method of representing multidimensional data, comprising the steps of:
- a) partitioning the multidimensional data into one or more clusters;
b) generating and storing clustering information for said one or more clusters;
c) generating one or more reduced dimensionality clusters and dimensionality reduction information for said one or more clusters;
d) storing the dimensionality reduction information;
e) creating a hierarchy of reduced dimensionality clusters by recursively applying said steps a) through d); and
f) generating and storing one or more low-dimensional idexes for clusters at a lowest level of said hierarchy.
1 Assignment
0 Petitions
Accused Products
Abstract
An improved multidimensional data indexing technique that generates compact indexes such that most or all of the index can reside in main memory at any time. During the clustering and dimensionality reduction, clustering information and dimensionality reduction information are generated for use in a subsequent search phase. The indexing technique can be effective even in the presence of variables which are not highly correlated. Other features provide for efficiently performing exact and nearest neighbor searches using the clustering information and dimensionality reduction information. One example of the dimensionality reduction uses a singular value decomposition technique. The method can also be recursively applied to each of the reduced-dimensionality clusters. The dimensionality reduction can also be applied to the entire database as a first step of the index generation.
-
Citations
73 Claims
-
1. A computerized method of representing multidimensional data, comprising the steps of:
-
a) partitioning the multidimensional data into one or more clusters; b) generating and storing clustering information for said one or more clusters; c) generating one or more reduced dimensionality clusters and dimensionality reduction information for said one or more clusters; d) storing the dimensionality reduction information; e) creating a hierarchy of reduced dimensionality clusters by recursively applying said steps a) through d); and f) generating and storing one or more low-dimensional idexes for clusters at a lowest level of said hierarchy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A program storage device readable by a machine which includes one or more reduced dimensionality indexes to multidimensional data, the program storage device tangibly embodying a program of instructions executable by the machine to perform method steps for representing multidimensional data, said method steps comprising:
-
a) partitioning the multidimensional data into one or more clusters; b) generating and storing clustering information for said one or more clusters; c) generating one or more reduced dimensionality clusters and dimensionality reduction information for said one or more clusters; and d) storing the dimensionality reduction information; e) creating a hierarchy of reduced dimensionality clusters by recursively applying said steps a) through d); and f) generating and storing one or more low-dimensional indexes for clusters at a lowest level of said hierarchy. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
-
-
51. A computer program product comprising:
a computer usable medium having computer readable program code means embodied therein for representing multidimensional data, the computer readable program code means in said computer program product comprising; computer readable program code clustering means for causing a computer to effect (a) partitioning the multidimensional data into one or more clusters; computer readable program code means, coupled to said clustering means, for causing a computer to effect (b) generating and storing clustering information for said one or more clusters; computer readable program code dimensionality reduction means, coupled to said clustering means, for causing a computer to effect, (c) generating one or more reduced dimensionality clusters and dimensionality reduction information for said one or more clusters; computer readable program code means, coupled to said dimensionality reduction means, for causing a computer to (d) effect storing the dimensionality reduction information; computer readable program code means for causing a computer to effect creating a hierarchy of reduced dimensionality clusters by recursively applying said steps a) through d); and computer readable program code means for causing a computer to effect generating and storing one or more low-dimensional indexes for clusters at a lowest level of said hierarchy. - View Dependent Claims (54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 66, 67, 68, 69, 70, 71, 72, 73)
-
52. A computerized method of representing multidimensional data which includes spatial geographic information, said method comprising the steps of:
-
a) partitioning the spatial geographic information into one or more clusters; b) generating and storing clustering information for said one or more clusters; c) generating one or more reduced dimensionality clusters and dimensionality reduction information for said one or more clusters; d) storing the dimensionality reduction information; e) creating a hierarchy of reduced dimensionality clusters by recursively applying said steps a) through d); and f) generating and storing one or more low-dimensional indexes for clusters at a lowest level of said hierarchy. - View Dependent Claims (53)
-
-
65. A computerized method of representing multidimensional data which includes point-of-sale transactions with associated geographic coordinates, said method comprising the steps of:
-
a) partitioning the point-of-sales transactions into one or more clusters; b) generating and storing clustering information for said one or more clusters; c) generating one or more reduced dimensionality clusters and dimensionality reduction information for said one or more clusters; and d) storing the dimensionality reduction information.
-
Specification