INFORMATION RETRIEVAL SYSTEM AND METHOD USING A BAYESIAN ALGORITHM BASED ON PROBABILISTIC SIMILARITY SCORES
First Claim
1. A computer-implemented method of scoring similarity between one or more query items and one or more other items, each of the items being represented by a feature vector xi comprising a plurality of digitally represented features xij, the method including:
- a) receiving an input identifying the query items;
b) for each of the other items computing a score which is a function of a conditional probability of the feature vectors xi of the query items being generated from a generating distribution p(xi|θ
defined by parameters θ
given that the feature vector xi of the respective other item is generated from the generating distribution p(xi|θ
; and
c) returning a score for each of the other items, a list of some or all of the other items sorted by their respective score or a list of n other items which have the highest score.
2 Assignments
0 Petitions
Accused Products
Abstract
An algorithm is provided which uses a model-based concept of a cluster and scores items using a score representative of the probability that a given item has been generated from the same distribution as one or more query items. The items are represented by a feature vector xi comprising a plurality of digitally represented features xij the method including: receiving an input identifying the query items; for each of the other items computing a score which is a function of a conditional probability of the feature vectors xij of the query items being generated from the generating distribution formula (I) given that the respective other item is generated from the generating distribution formula (I) and returning a score for each of the other items, a list of some or all of the other items, sorted by their respective score, or a list of n other items which have the highest score.
-
Citations
27 Claims
-
1. A computer-implemented method of scoring similarity between one or more query items and one or more other items, each of the items being represented by a feature vector xi comprising a plurality of digitally represented features xij, the method including:
-
a) receiving an input identifying the query items; b) for each of the other items computing a score which is a function of a conditional probability of the feature vectors xi of the query items being generated from a generating distribution p(xi|θ
defined by parameters θ
given that the feature vector xi of the respective other item is generated from the generating distribution p(xi|θ
; andc) returning a score for each of the other items, a list of some or all of the other items sorted by their respective score or a list of n other items which have the highest score. - View Dependent Claims (2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
5. A computer implemented method of scoring the similarity between N query items and one or more other items, each of the items being represented by a feature vector xi comprising a plurality of binary features xij, the method including:
-
a) receiving an input identifying the query items b) defining a vector q for the query, the elements of q being defined by qj=log {tilde over (α
)}j−
log α
j−
log {tilde over (β
)}j+log β
j whereby α
j and β
j are parameters, {tilde over (α
)}j=α
j+Σ
k=1N xkj, {tilde over (β
)}j=β
j+N−
Σ
k=1N xkj, and the sum is over the query itemsc) calculating a score as a function of a product of a matrix X and q, whereby X is a matrix containing all feature vectors xi of the other items d) returning a score for each of the other items a list of some or all of the other items sorted by their respective score, or a list of n other items which have the highest score.
-
-
25. A computer implemented method of searching a data base of images including:
-
responsive to a user input of search criteria, searching a data base of labelled images to return one or more images having at least one label matching the query; receiving a user selection of images among the returned images; calculating a similarity score between the selected images and unlabelled images in the data base; and returning a set of unlabelled images based on their respective scores.
-
-
26. A computer implemented method of cleaning up a data set of items labelled with a particular label including:
-
for each item of the data set calculating a clean up score which is a measure of the similarity between all the items in the data set leaving out the item to be scored and the item to be scored; and removing items based on the respective clean ups scores, thereby cleaning up the data set.
-
-
27. A computer implemented method of annotating an item including:
-
calculating an annotation score for each of a set of labels as a measure of similarity between items labelled with the label to be scored and the item to be annotated; and selecting one or more labels to be applied to the item to be annotated based on the respective annotation scores.
-
Specification