Dual cross-media relevance model for image annotation
First Claim
1. A method comprising:
- for each word v of a lexicon V;
obtaining a word-to-word correlation between a candidate word w and each word v in the lexicon V measured by P(w|v) which comprises a conditional probability of the candidate word w given the word v; and
obtaining a word-to-image correlation between the word v and a target image Iu independently of a training set of images, based at least on an image representation of the word v that is developed through an analysis of one or more top ranked images obtained by an image search engine searching on the word v, the word-to-image correlation between each word v in the lexicon V and the target image Iu measured by P(Iu|v) which comprises a conditional probability of the target image Iu given the word v;
determining a value of a collective word-to-image correlation between the candidate word w and the target image Iu based on the word-to-word correlations between the candidate word w and each word v in the lexicon V and the word-to-image correlations between each word v in the lexicon V and the target image Iu; and
annotating the target image Iu using the candidate word w if the collective word-to-image correlation between the candidate word w and the target image Iu satisfies a preset condition, the value of the collective word-to-image correlation between the candidate word w and the target image Iu calculated using;
2 Assignments
0 Petitions
Accused Products
Abstract
A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.
78 Citations
20 Claims
-
1. A method comprising:
-
for each word v of a lexicon V; obtaining a word-to-word correlation between a candidate word w and each word v in the lexicon V measured by P(w|v) which comprises a conditional probability of the candidate word w given the word v; and obtaining a word-to-image correlation between the word v and a target image Iu independently of a training set of images, based at least on an image representation of the word v that is developed through an analysis of one or more top ranked images obtained by an image search engine searching on the word v, the word-to-image correlation between each word v in the lexicon V and the target image Iu measured by P(Iu|v) which comprises a conditional probability of the target image Iu given the word v; determining a value of a collective word-to-image correlation between the candidate word w and the target image Iu based on the word-to-word correlations between the candidate word w and each word v in the lexicon V and the word-to-image correlations between each word v in the lexicon V and the target image Iu; and annotating the target image Iu using the candidate word w if the collective word-to-image correlation between the candidate word w and the target image Iu satisfies a preset condition, the value of the collective word-to-image correlation between the candidate word w and the target image Iu calculated using; - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method performed by a computing device for image annotation, the method comprising:
-
obtaining a plurality of word-to-word correlations, each word-to-word correlation being defined between a pair of words selected from a lexicon V, each word-to-word correlation between a candidate word w and each word v in the lexicon V measured by P(w|v) which comprises a conditional probability of the word w given the word v; obtaining a plurality of word-to-image correlations independently of a training set of images, each word-to-image correlation being defined between the word v in the lexicon V and a target image Iu being annotated, each word-to-image correlation between each word v and the target image Iu measured by P(Iu|v) which comprises a conditional probability of the target image Iu given the word v; for a candidate word w selected from the lexicon V, determining value of a collective word-to-image correlation between the candidate word w and the target image Iu based on the word-to-word correlations between the candidate word w and each word v in the lexicon V, and the word-to-image correlations between each word v in the lexicon V and the target image Iu, wherein the collective word-to-image correlation is a probability of the target image Iu given the candidate word w; and annotating the target image Iu using the candidate word w if the value of the collective word-to-image correlation between the candidate word w and the target image Iu satisfies a preset condition. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. One or more computer readable memory devices having stored thereupon a plurality of instructions that, when executed by a processor, causes the processor to:
-
obtain a plurality of word-to-word correlations, each word-to-word correlation being defined between a pair of words selected from a lexicon V, each word-to-word correlation between a candidate word w and each word v in the lexicon V measured by P(w|v) which comprises a conditional probability of the word w given the word v; obtain a plurality of word-to-image correlations, each word-to-image correlation being defined between a word v in the lexicon V and the target image Iu being annotated, based at least on an image representation of the word v, each word-to-image correlation between each word v and the target image Iu measured by P(Iu|v) which comprises a conditional probability of the target image Iu given the word v; for the candidate word w selected from lexicon V, determine a value of a collective word-to-image correlation between the candidate word w and the target image Iu based on the word-to-word correlations between each word v in the lexicon V and the candidate word w, and the word-to-image correlations between each word v in the lexicon V and the target image Iu, wherein the collective word-to-image correlation is a probability of the target image Iu given the candidate word w; and annotate the target image Iu using the candidate word w if the collective word-to-image correlation between the candidate word w and the target image Iu satisfies a preset condition. - View Dependent Claims (19, 20)
-
Specification