Similarity search system with compact data structures
First Claim
1. A method of searching a plurality of stored objects comprising the steps of:
- generating with a segmentation and feature extraction unit a collection of real-valued multi-dimensional vectors representing each said object, each of said multi-dimensional vectors having an associated weight;
converting when executed by a disk each of said multi-dimensional vectors into a sketch using a thresholding and transformation algorithm and storing said sketch in a database to produce a collection of sketches corresponding to each object, wherein each said sketch comprises a bit vector that is more compact than a real-valued multi-dimensional vector from which said sketch is converted; and
finding objects closest to a query object with a similarity search engine, wherein said similarity search engine finds said objects closest to said query object based upon said sketches stored in said database.
2 Assignments
0 Petitions
Accused Products
Abstract
A content-addressable and -searchable storage system for managing and exploring massive amounts of feature-rich data such as images, audio or scientific data, is shown. A segmentation and feature extraction unit segments data corresponding to an object into a plurality of data segments and -generates a feature vector for each data segment. A sketch construction component converts the feature vector into a compact bit-vector corresponding to the object. The system also has a similarity index having plurality of compact bit-vectors corresponding to a plurality of objects and an index insertion component for inserting a compact bit-vector corresponding to an object into the similarity index. The system may further have an indexing unit for identifying a candidate set of objects from said similarity index based upon a compact bit-vector corresponding to a query object. Still further, the system may additionally have a similarity ranking component for ranking objects in said candidate set by estimating their distances to the query object.
-
Citations
10 Claims
-
1. A method of searching a plurality of stored objects comprising the steps of:
-
generating with a segmentation and feature extraction unit a collection of real-valued multi-dimensional vectors representing each said object, each of said multi-dimensional vectors having an associated weight; converting when executed by a disk each of said multi-dimensional vectors into a sketch using a thresholding and transformation algorithm and storing said sketch in a database to produce a collection of sketches corresponding to each object, wherein each said sketch comprises a bit vector that is more compact than a real-valued multi-dimensional vector from which said sketch is converted; and finding objects closest to a query object with a similarity search engine, wherein said similarity search engine finds said objects closest to said query object based upon said sketches stored in said database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification