Content-based information retrieval
First Claim
1. A computer-implemented method of similar item retrieval comprising:
- receiving a query item;
analyzing content of the query item, the analyzing comprising identifying tokens in that query item using a library of tokens, wherein each token comprises a symbol representing a cluster of features;
dynamically forming a classifier, using a processor, at query time on the basis of the query item'"'"'s content and a training set of items, wherein the training set comprises a plurality of pairs of items and a plurality of background items such that for each pair, the items in that pair are specified as similar to one another, the forming the classifier comprising choosing a subset of the identified tokens such that, on the training set as many as possible of the similar pairs have the chosen subset of tokens while the number of background items containing the subset of tokens is below a specified bound; and
using the classifier to select a plurality of items from a database of items.
2 Assignments
0 Petitions
Accused Products
Abstract
Content-based information retrieval is described. In an example, a query item such as an image, document, email or other item is presented and items with similar content are retrieved from a database of items. In an example, each time a query is presented, a classifier is formed based on that query and using a training set of items. For example, the classifier is formed in real-time and is formed in such a way that a limit on the proportion of the items in the database that will be retrieved is set. In an embodiment, the query item is analyzed to identify tokens in that item and subsets of those tokens are selected to form the classifier. For example, the subsets of tokens are combined using Boolean operators in a manner which is efficient for searching on particular types of database.
-
Citations
17 Claims
-
1. A computer-implemented method of similar item retrieval comprising:
-
receiving a query item; analyzing content of the query item, the analyzing comprising identifying tokens in that query item using a library of tokens, wherein each token comprises a symbol representing a cluster of features; dynamically forming a classifier, using a processor, at query time on the basis of the query item'"'"'s content and a training set of items, wherein the training set comprises a plurality of pairs of items and a plurality of background items such that for each pair, the items in that pair are specified as similar to one another, the forming the classifier comprising choosing a subset of the identified tokens such that, on the training set as many as possible of the similar pairs have the chosen subset of tokens while the number of background items containing the subset of tokens is below a specified bound; and using the classifier to select a plurality of items from a database of items. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A content-based image search apparatus comprising:
-
an input arranged to receive a query image; a content analysis engine arranged to identify tokens in the query image; a classifier construction engine arranged to form a classifier on the basis of the identified tokens in the query image and on the basis of a training set of images by selecting a subset of the identified tokens such that the proportion of images in the training set which are classified as similar to the query image using the selected subset of the identified tokens is below a specified bound; a processor arranged to use the formed classifier to classify images from a database of images according to their similarity to the query image; and an output arranged to provide images from the database on the basis of their classification. - View Dependent Claims (12, 13, 14, 15)
-
-
16. One or more storage devices with device-executable instructions for performing operations comprising:
-
receiving a query item; analyzing content of the query item, the analyzing the content of the query item comprising identifying tokens in that query item using a library of tokens, wherein each token comprises a symbol representing a cluster of features; dynamically forming a classifier at query time on the basis of the query item'"'"'s content and the basis of properties of that content with respect to a training set of items, the training set comprising a plurality of pairs of items and a plurality of background items such that for each pair, the items in that pair are specified as similar to one another, the forming the classifier comprising choosing a subset of the identified tokens such that, on the training set as many as possible of the similar pairs have the chosen subset of tokens while the number of background items containing the subset of tokens is below a specified bound; and using the classifier to select a plurality of items from a database of items. - View Dependent Claims (17)
-
Specification