Method for high-dimensionality indexing in a multi-media database
First Claim
1. A method for indexing in a database of stored objects, the method comprising the steps of:
- a) applying a set of feature extraction functions to extract a set of feature vectors from the stored objects in the database, the set of feature extraction functions having a similarity measure applicable to the stored objects;
b) transforming the set of extracted feature vectors using an orthonormal transform such that the similarity measure is preserved, and such that the information of the set of extracted feature vectors is segregated into (i) a subset of the transformed feature vectors in which information of the set of extracted feature vectors is concentrated, and (ii) entries which contribute little to the information of the transformed vectors;
c) truncating the transformed feature vectors such that the entries which contribute little to the information of the transformed vectors are removed; and
d) indexing the truncated feature vectors using a non-sequential point-access-method (PAM).
0 Assignments
0 Petitions
Accused Products
Abstract
A high dimensional indexing method is disclosed which takes a set of objects that can be viewed as N-dimensional data vectors and builds an index which treats the objects like k-dimensional points. The method first defines and applies a set of feature extraction functions that admit some similarity measure for each of the stored objects in the database. The feature vector is then transformed in a manner such that the similarity measure is preserved and that the information of the feature vector v is concentrated in only a few coefficients. The entries of the feature vectors are truncated such that the entries which contribute little on the average to the information of the transformed vectors are removed. An index based on the truncated feature vectors is subsequently built using a point access method (PAM). A preliminary similarity search can then be conducted on the set of truncated transformed vectors using the previously created index to retrieve the qualifying records. A second search on the previously retrieved set of vectors is used to eliminate the false positives and to get the results of the desired similarity search.
168 Citations
12 Claims
-
1. A method for indexing in a database of stored objects, the method comprising the steps of:
-
a) applying a set of feature extraction functions to extract a set of feature vectors from the stored objects in the database, the set of feature extraction functions having a similarity measure applicable to the stored objects; b) transforming the set of extracted feature vectors using an orthonormal transform such that the similarity measure is preserved, and such that the information of the set of extracted feature vectors is segregated into (i) a subset of the transformed feature vectors in which information of the set of extracted feature vectors is concentrated, and (ii) entries which contribute little to the information of the transformed vectors; c) truncating the transformed feature vectors such that the entries which contribute little to the information of the transformed vectors are removed; and d) indexing the truncated feature vectors using a non-sequential point-access-method (PAM). - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for identifying similar ones of stored objects in a database, the method comprising the steps of:
-
a) identifying a subset of objects in the database, the subset including the similar objects to be identified; b) indexing said subset of similar objects, the step of indexing comprising the steps of; applying a set of feature extraction functions to extract a set of feature vectors, having respective sets of entries, from the stored objects in the database, the set of feature extraction functions having a similarity measure applicable to the stored objects; transforming the extracted feature vectors using an orthonormal transform such that the similarity measure is preserved, and such that the information of the extracted feature vectors is segregated into (i) a subset of the transformed feature vectors in which information of the set of extracted feature vectors is concentrated, and (ii) entries which contribute little to the information of the transformed vectors; truncating the transformed feature vectors to obtain a subset of the entries of the feature vectors such that the entries which contribute little to the information of the transformed vectors are removed; and indexing the truncated feature vectors using a non-sequential point-access-method (PAM); c) conducting a preliminary similarity search on the indexed set of truncated transformed feature vectors to retrieve a set of vectors which represent a superset of objects including desired objects and false positives; and d) performing a secondary search on the retrieved set of vectors to eliminate the false positives. - View Dependent Claims (7, 8)
-
-
9. A database computer system containing multi-dimensional objects, the system comprising:
-
a) at least one storage device for storing said objects; b) a computer processor for controlling access to said stored objects; c) memory coupled to said computer processor for access thereto; and d) program means, executable by the computer processor, for performing the steps of; applying a set of feature extraction functions to extract a set of feature vectors, having respective sets of entries, from the stored objects in the database, the set of feature extraction functions having a similarity measure applicable to the stored objects; transforming the extracted feature vectors using an orthonormal transform such that the similarity measure is preserved, and such that the information of the extracted feature vectors is segregated into (i) a subset of the transformed feature vectors in which information of the set of extracted feature vectors is concentrated, and (ii) entries which contribute little to the information of the transformed vectors; truncating the transformed feature vectors to obtain a subset of the entries of the feature vectors such that the entries which contribute little to the information of the transformed vectors are removed; and indexing the truncated feature vectors using a non-sequential point-access-method (PAM). - View Dependent Claims (10)
-
-
11. A computer program product, for use in a database computer system containing stored objects, the computer program product comprising:
-
a) a recording medium; and b) means, recorded on said medium, for instructing said computer system to perform the steps of; applying a set of feature extraction functions to extract a set of feature vectors, having respective sets of entries, from the stored objects in the database, the set of feature extraction functions having a similarity measure applicable to the stored objects; transforming the extracted feature vectors using an orthonormal transform such that the similarity measure is preserved, and such that the information of the extracted feature vectors is segregated into (i) a subset of the transformed feature vectors in which information of the set of extracted feature vectors is concentrated, and (ii) entries which contribute little to the information of the transformed vectors; truncating the transformed feature vectors such that the entries which contribute little to the information of the transformed vectors are removed; indexing the truncated feature vectors using a non-sequential point-access-method (PAM); conducting a preliminary similarity search on the indexed set of truncated transformed feature vectors to retrieve a set of vectors which represent a superset of objects including desired objects and false positives; and performing a secondary search on the retrieved set of vectors to eliminate the false positives. - View Dependent Claims (12)
-
Specification