Dual Cross-Media Relevance Model for Image Annotation

US 20090076800A1
Filed: 12/13/2007
Published: 03/19/2009
Est. Priority Date: 09/13/2007
Status: Active Grant

First Claim

Patent Images

1. A method for image annotation, the method comprising:

(a) for each word of a lexicon, obtaining a word-to-word correlation between the word and a candidate word, and obtaining a word-to-image correlation between the word and a target image;

(b) determining value of a collective word-to-image correlation between the candidate word and the target image based on the word-to-word correlations between the candidate word and each word in the lexicon and the word-to-image correlations between each word in the lexicon and the target image; and

(c) annotating the target image using the candidate word if the collective word-to-image correlation between the candidate word and the target image satisfies a preset condition.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.

96 Citations

View as Search Results

20 Claims

1. A method for image annotation, the method comprising:
- (a) for each word of a lexicon, obtaining a word-to-word correlation between the word and a candidate word, and obtaining a word-to-image correlation between the word and a target image;
  
  (b) determining value of a collective word-to-image correlation between the candidate word and the target image based on the word-to-word correlations between the candidate word and each word in the lexicon and the word-to-image correlations between each word in the lexicon and the target image; and
  
  (c) annotating the target image using the candidate word if the collective word-to-image correlation between the candidate word and the target image satisfies a preset condition.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method as recited in claim 1, wherein acts (a) and (b) are repeated for a plurality of candidate words selected from the lexicon, and wherein annotating the target image comprises:
    - comparing the values of the collective word-to-image correlations of the plurality of candidate words; and
      
      annotating the target image with at least one of the candidate words whose value of the collective word-to-image correlation is among the highest of the plurality of candidate words.
  - 3. The method as recited in claim 1, wherein annotating the target image comprises:
    - calculating the value of the collective word-to-image correlation between the candidate word and the target image using the following formula;
      
      $\begin{matrix} w^{*} = \arg \max_{w ⋐ V} {P (w, I_{u})} \\ = \arg \max_{w ⋐ V} \sum_{v \in V} P (w, I_{u} | v) P (v) \end{matrix},$ wherein P(v) is a measure of importance or popularity of each word v in the lexicon V, and P(w, I_u\v) is a joint probability of the candidate word w and the target image I_ugiven the word v.
  - 4. The method as recited in claim 1, wherein the word-to-word correlation between the candidate word w and each word v in the lexicon is measured by P(w\v) which is a conditional probability of the candidate word w given the word v, and the word-to-image correlation between each word v in the lexicon V and the target image I_uis measured by P(I_u\v) which is a conditional probability of the target image I_ugiven the word v, and wherein annotating the target image comprises:
    - calculating the value of the collective word-to-image correlation between the candidate word w and the target image I_uusing the following formula;
      
      $w^{*} = \arg \max_{w ⋐ V} \sum_{v \in V} P (I_{u} | v) P (w | v) P (v),$ wherein P(v) is a measure of importance or popularity of word v in the lexicon V.
  - 5. The method as recited in claim 1, wherein obtaining the word-to-image correlation between each word and the target image comprises:
    - performing an image search on an image data source using the word as a query word;
      
      selecting a plurality of images from results of the image search to form an image representation of the word;
      
      determining an image-to-image correlation between the target image and the image representation of the word; and
      
      obtaining the word-to-image correlation between the word and the target image based on the image-to-image correlation between the target image and the image representation of the word.
  - 6. The method as recited in claim 1, wherein the target image is associated with a characterizing text, and obtaining the word-to-image correlation between each word and the target image comprises:
    - determining a word-to-word correlation between the word and the characterizing text; and
      
      determining the word-to-image correlation between the word and the target image based on the word-to-word correlation between the word and the characterizing text.
  - 7. The method as recited in claim 1, wherein obtaining the word-to-word correlation between each word and the candidate word comprises:
    - calculating a visual distance or a visual similarity between an image representation of the word and an image representation of the candidate word.
  - 8. The method as recited in claim 1, wherein obtaining the word-to-word correlation between each word and the candidate word comprises:
    - providing an image representation of the word and an image representation of the candidate word; and
      
      determining the word-to-word correlation between the word and the candidate word at least partially based on visual features of the image representations of the word and the candidate word.
  - 9. The method as recited in claim 8, wherein providing the image representation of each word comprises:
    - conducting an image search using the word as query word; and
      
      selecting a plurality of images from search results.
  - 10. The method as recited in claim 8, wherein providing the image representation of the candidate word comprises:
    - conducting an image search using the candidate word as query word; and
      
      selecting a plurality of images from search results.
  - 11. The method as recited in claim 1, wherein obtaining the word-to-word correlation between each word and the candidate word comprises:
    - providing a conjunctive image representation of the word and the candidate word; and
      
      determining the word-to-word correlation between the word and the candidate word at least partially based on visual features of the conjunctive image representation of the word and the candidate word.
  - 12. The method as recited in claim 1, wherein obtaining the word-to-word correlation between each word and the candidate word comprises:
    - calculating a first correlation between the word and the candidate word based on visual features of an image representation of the word and an image representation of the candidate word;
      
      calculating a second correlation between the word and the candidate word using an alternative method; and
      
      determining the word-to-word correlation between the word and the candidate word by combining the first correlation and the second correlation.

13. A method for image annotation, the method comprising:
- (a) obtaining a plurality of word-to-word correlations, each word-to-word correlation being defined between a pair of words selected from a lexicon;
  
  (b) obtaining a plurality of word-to-image correlations, each word-to-image correlation being defined between a word in the lexicon and a target image being annotated;
  
  (c) for a candidate word selected from lexicon, determining value of a collective word-to-image correlation between the candidate word and the target image based on the word-to-word correlations between the candidate word and each word in the lexicon, and the word-to-image correlations between each word in the lexicon and the target image; and
  
  (d) annotating the target image using the candidate word if the value of the collective word-to-image correlation between the candidate word and the target image satisfies a preset condition.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The method as recited in claim 13, wherein the act (c) is performed for a plurality of candidate words respectively, and annotating the target image comprises:
    - comparing the values of the collective word-to-image correlations of the plurality of candidate words; and
      
      annotating the target image with at least one of the candidate words whose value of the collective word-to-image correlation is among the highest of the plurality of candidate words.
  - 15. The method as recited in claim 13, wherein the word-to-word correlation between the candidate word w and each word v in the lexicon V is measured by P(w\v) which is a conditional probability of the word w given the word v, and the word-to-image correlation between each word v and the target image I_uis measured by P(I_u\v) which is a conditional probability of the target image I_ugiven the word v, and wherein act (c) comprises:
    - calculating the value of the collective word-to-image correlation between the target image I_uand the candidate word w using the following formula;
      
      $w^{*} = \arg \max_{w ⋐ V} \sum_{v \in V} P (I_{u} | v) P (w | v) P (v),$ wherein P(v) is a measure of importance or popularity of word v in the lexicon V.
  - 16. The method as recited in claim 13, wherein obtaining the word-to-image correlation between each word and the target image comprises:
    - performing an image search on an image data source using the word as a query word;
      
      selecting a plurality of images from results of the image search to form an image representation of the word;
      
      determining an image-to-image correlation between the target image and the image representation of the word; and
      
      determining the word-to-image correlation between the word and the target image based on the image-to-image correlation between the target image and the image representation of the word.
  - 17. The method as recited in claim 13, wherein obtaining the word-to-word correlation between each pair of words comprises:
    - calculating a visual distance or a visual similarity between an image representation of first word of the pair and an image representation of second word of the pair.
  - 18. The method as recited in claim 13, wherein obtaining the word-to-word correlation between each pair of words comprises:
    - providing a conjunctive image representation of the pair of words; and
      
      determining the word-to-word correlation between the pair of words at least partially based on visual features of the conjunctive image representation of the pair of words.
  - 19. The method as recited in claim 13, wherein obtaining the word-to-word correlation between each pair of words comprises:
    - for each of the pair of words, conducting an image search on an image data source using the word as a query word, and selecting a plurality of images from search results to form an image representation of the word; and
      
      determining the word-to-word correlation between each pair of words based on an image-to-image correlation between the image representations of the pair of words.

20. One or more computer readable media having stored thereupon a plurality of instructions that, when executed by a processor, causes the processor to:
- (a) obtain a plurality of word-to-word correlations, each word-to-word correlation being defined between a pair of words selected from a lexicon;
  
  (b) obtain a plurality of word-to-image correlations, each word-to-image correlation being defined between a word in the lexicon and the target image being annotated;
  
  (c) for a candidate word selected from lexicon, determine value of a collective word-to-image correlation between the candidate word and the target image based on the word-to-word correlations between each word in the lexicon and the candidate word, and the word-to-image correlations between each word in the lexicon and the target image; and
  
  (d) annotate the target image using the candidate word if the collective word-to-image correlation between the candidate word and the target image satisfies a preset condition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Liu, Jing, Ma, Wei-Ying, Li, Mingjing, Wang, Bin, Li, Zhiwei

Granted Patent

US 8,571,850 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/10
CPC Class Codes

G06F 40/169 Annotation, e.g. comment da...

G06F 40/242 Dictionaries

Dual Cross-Media Relevance Model for Image Annotation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

96 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Dual Cross-Media Relevance Model for Image Annotation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

96 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links