Dual Cross-Media Relevance Model for Image Annotation
First Claim
1. A method for image annotation, the method comprising:
- (a) for each word of a lexicon, obtaining a word-to-word correlation between the word and a candidate word, and obtaining a word-to-image correlation between the word and a target image;
(b) determining value of a collective word-to-image correlation between the candidate word and the target image based on the word-to-word correlations between the candidate word and each word in the lexicon and the word-to-image correlations between each word in the lexicon and the target image; and
(c) annotating the target image using the candidate word if the collective word-to-image correlation between the candidate word and the target image satisfies a preset condition.
2 Assignments
0 Petitions
Accused Products
Abstract
A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.
96 Citations
20 Claims
-
1. A method for image annotation, the method comprising:
-
(a) for each word of a lexicon, obtaining a word-to-word correlation between the word and a candidate word, and obtaining a word-to-image correlation between the word and a target image; (b) determining value of a collective word-to-image correlation between the candidate word and the target image based on the word-to-word correlations between the candidate word and each word in the lexicon and the word-to-image correlations between each word in the lexicon and the target image; and (c) annotating the target image using the candidate word if the collective word-to-image correlation between the candidate word and the target image satisfies a preset condition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for image annotation, the method comprising:
-
(a) obtaining a plurality of word-to-word correlations, each word-to-word correlation being defined between a pair of words selected from a lexicon; (b) obtaining a plurality of word-to-image correlations, each word-to-image correlation being defined between a word in the lexicon and a target image being annotated; (c) for a candidate word selected from lexicon, determining value of a collective word-to-image correlation between the candidate word and the target image based on the word-to-word correlations between the candidate word and each word in the lexicon, and the word-to-image correlations between each word in the lexicon and the target image; and (d) annotating the target image using the candidate word if the value of the collective word-to-image correlation between the candidate word and the target image satisfies a preset condition. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. One or more computer readable media having stored thereupon a plurality of instructions that, when executed by a processor, causes the processor to:
-
(a) obtain a plurality of word-to-word correlations, each word-to-word correlation being defined between a pair of words selected from a lexicon; (b) obtain a plurality of word-to-image correlations, each word-to-image correlation being defined between a word in the lexicon and the target image being annotated; (c) for a candidate word selected from lexicon, determine value of a collective word-to-image correlation between the candidate word and the target image based on the word-to-word correlations between each word in the lexicon and the candidate word, and the word-to-image correlations between each word in the lexicon and the target image; and (d) annotate the target image using the candidate word if the collective word-to-image correlation between the candidate word and the target image satisfies a preset condition.
-
Specification