Dual cross-media relevance model for image annotation

US 8,571,850 B2
Filed: 12/13/2007
Issued: 10/29/2013
Est. Priority Date: 09/13/2007
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

for each word v of a lexicon V;

obtaining a word-to-word correlation between a candidate word w and each word v in the lexicon V measured by P(w|v) which comprises a conditional probability of the candidate word w given the word v; and

obtaining a word-to-image correlation between the word v and a target image I_uindependently of a training set of images, based at least on an image representation of the word v that is developed through an analysis of one or more top ranked images obtained by an image search engine searching on the word v, the word-to-image correlation between each word v in the lexicon V and the target image I_umeasured by P(I_u|v) which comprises a conditional probability of the target image I_ugiven the word v;

determining a value of a collective word-to-image correlation between the candidate word w and the target image I_ubased on the word-to-word correlations between the candidate word w and each word v in the lexicon V and the word-to-image correlations between each word v in the lexicon V and the target image I_u; and

annotating the target image I_uusing the candidate word w if the collective word-to-image correlation between the candidate word w and the target image I_usatisfies a preset condition, the value of the collective word-to-image correlation between the candidate word w and the target image I_ucalculated using;

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.

78 Citations

View as Search Results

20 Claims

1. A method comprising:
- for each word v of a lexicon V;
  
  obtaining a word-to-word correlation between a candidate word w and each word v in the lexicon V measured by P(w|v) which comprises a conditional probability of the candidate word w given the word v; and
  
  obtaining a word-to-image correlation between the word v and a target image I_uindependently of a training set of images, based at least on an image representation of the word v that is developed through an analysis of one or more top ranked images obtained by an image search engine searching on the word v, the word-to-image correlation between each word v in the lexicon V and the target image I_umeasured by P(I_u|v) which comprises a conditional probability of the target image I_ugiven the word v;
  
  determining a value of a collective word-to-image correlation between the candidate word w and the target image I_ubased on the word-to-word correlations between the candidate word w and each word v in the lexicon V and the word-to-image correlations between each word v in the lexicon V and the target image I_u; and
  
  annotating the target image I_uusing the candidate word w if the collective word-to-image correlation between the candidate word w and the target image I_usatisfies a preset condition, the value of the collective word-to-image correlation between the candidate word w and the target image I_ucalculated using;
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method as recited in claim 1, wherein obtaining the word-to-word correlation between the word v and the candidate word w and determining the value of the collective word-to-image correlation between the candidate word w and the target image I_uare repeated for a plurality of candidate words selected from the lexicon V, and wherein annotating the target image I_ucomprises:
    - comparing the values of the collective word-to-image correlations of the plurality of candidate words; and
      
      annotating the target image I_uwith at least one of the candidate words whose value of the collective word-to-image correlation is among the highest of the plurality of candidate words.
  - 3. The method as recited in claim 1, wherein annotating the target image I_ucomprises:
    - calculating the value of the collective word-to-image correlation between the candidate word w and the target image using the following formula;
  - 4. The method as recited in claim 1, wherein obtaining the word-to-image correlation between each word v and the target image I_ucomprises:
    - performing an image search on an image data source using the word v as a query word;
      
      selecting a plurality of images from results of the image search to form the image representation of the word v;
      
      determining an image-to-image correlation between the target image I_uand the image representation of the word v; and
      
      obtaining the word-to-image correlation between the word v and the target image I_ubased on the image-to-image correlation between the target image I_uand the image representation of the word v, wherein the word-to-image correlation is the conditional probability of the target image I_ugiven the word v.
  - 5. The method as recited in claim 1, wherein the target image I_uis associated with a characterizing text, and obtaining the word-to-image correlation between each word v and the target image I_ucomprises:
    - determining a word-to-word correlation between the word v and the characterizing text; and
      
      determining the word-to-image correlation between the word v and the target image I_ubased on the word-to-word correlation between the word v and the characterizing text.
  - 6. The method as recited in claim 1, wherein obtaining the word-to-word correlation between each word v and the candidate word w comprises:
    - calculating a visual distance or a visual similarity between the image representation of the word v and an image representation of the candidate word w.
  - 7. The method as recited in claim 1, wherein obtaining the word-to-word correlation between each word v and the candidate word w comprises:
    - providing the image representation of the word v and an image representation of the candidate word w; and
      
      determining the word-to-word correlation between the word v and the candidate word w at least partially based on visual features of the image representations of the word and the candidate word w.
  - 8. The method as recited in claim 7, wherein providing the image representation of each word v comprises:
    - conducting an image search using the word v as a query word; and
      
      selecting a plurality of images from search results associated with the query word.
  - 9. The method as recited in claim 7, wherein providing the image representation of the candidate word w comprises:
    - conducting an image search using the candidate word w as a query word; and
      
      selecting a plurality of images from search results associated with the query word.
  - 10. The method as recited in claim 1, wherein obtaining the word-to-word correlation between each word v and the candidate word w comprises:
    - providing a conjunctive image representation of the word v and the candidate word w; and
      
      determining the word-to-word correlation between the word v and the candidate word w at least partially based on visual features of the conjunctive image representation of the word v and the candidate word w.
  - 11. The method as recited in claim 1, wherein obtaining the word-to-word correlation between each word v and the candidate word w comprises:
    - calculating a first correlation between the word v and the candidate word w based on visual features of the image representation of the word v and an image representation of the candidate word w;
      
      calculating a second correlation between the word v and the candidate word w using an alternative method that is at least one of a text-based method or a conjunctive image representation method; and
      
      determining the word-to-word correlation between the word v and the candidate word w by combining the first correlation and the second correlation.

12. A method performed by a computing device for image annotation, the method comprising:
- obtaining a plurality of word-to-word correlations, each word-to-word correlation being defined between a pair of words selected from a lexicon V, each word-to-word correlation between a candidate word w and each word v in the lexicon V measured by P(w|v) which comprises a conditional probability of the word w given the word v;
  
  obtaining a plurality of word-to-image correlations independently of a training set of images, each word-to-image correlation being defined between the word v in the lexicon V and a target image I_ubeing annotated, each word-to-image correlation between each word v and the target image I_umeasured by P(I_u|v) which comprises a conditional probability of the target image I_ugiven the word v;
  
  for a candidate word w selected from the lexicon V, determining value of a collective word-to-image correlation between the candidate word w and the target image I_ubased on the word-to-word correlations between the candidate word w and each word v in the lexicon V, and the word-to-image correlations between each word v in the lexicon V and the target image I_u, wherein the collective word-to-image correlation is a probability of the target image I_ugiven the candidate word w; and
  
  annotating the target image I_uusing the candidate word w if the value of the collective word-to-image correlation between the candidate word w and the target image I_usatisfies a preset condition.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The method as recited in claim 12, wherein determining a value of a collective word-to-image correlation between the candidate word w and the target image I_uis performed for a plurality of candidate words respectively, and annotating the target image I_ucomprises:
    - comparing the values of the collective word-to-image correlations of the plurality of candidate words; and
      
      annotating the target image I_uwith at least one of the candidate words whose value of the collective word-to-image correlation is among the highest of the plurality of candidate words.
  - 14. The method as recited in claim 12, wherein obtaining the word-to-image correlation between each word v and the target image I_ucomprises:
    - performing an image search on an image data source using the word v as a query word;
      
      selecting a plurality of images from results of the image search to form an image representation of the word v;
      
      determining an image-to-image correlation between the target image I_uand the image representation of the word v; and
      
      determining the word-to-image correlation between the word v and the target image I_ubased on the image-to-image correlation between the target image I_uand the image representation of the word v.
  - 15. The method as recited in claim 12, wherein obtaining the word-to-word correlation between each pair of words comprises:
    - calculating a visual distance or a visual similarity between an image representation of first word of the pair and an image representation of second word of the pair.
  - 16. The method as recited in claim 12, wherein obtaining the word-to-word correlation between each pair of words comprises:
    - providing a conjunctive image representation of the pair of words; and
      
      determining the word-to-word correlation between the pair of words at least partially based on visual features of the conjunctive image representation of the pair of words.
  - 17. The method as recited in claim 12, wherein obtaining the word-to-word correlation between each pair of words comprises:
    - for each of the pair of words, conducting an image search on an image data source using the word as a query word, and selecting a plurality of images from search results to form an image representation of the word; and
      
      determining the word-to-word correlation between each pair of words based on an image-to-image correlation between the image representations of the pair of words.

18. One or more computer readable memory devices having stored thereupon a plurality of instructions that, when executed by a processor, causes the processor to:
- obtain a plurality of word-to-word correlations, each word-to-word correlation being defined between a pair of words selected from a lexicon V, each word-to-word correlation between a candidate word w and each word v in the lexicon V measured by P(w|v) which comprises a conditional probability of the word w given the word v;
  
  obtain a plurality of word-to-image correlations, each word-to-image correlation being defined between a word v in the lexicon V and the target image I_ubeing annotated, based at least on an image representation of the word v, each word-to-image correlation between each word v and the target image I_umeasured by P(I_u|v) which comprises a conditional probability of the target image I_ugiven the word v;
  
  for the candidate word w selected from lexicon V, determine a value of a collective word-to-image correlation between the candidate word w and the target image I_ubased on the word-to-word correlations between each word v in the lexicon V and the candidate word w, and the word-to-image correlations between each word v in the lexicon V and the target image I_u, wherein the collective word-to-image correlation is a probability of the target image I_ugiven the candidate word w; and
  
  annotate the target image I_uusing the candidate word w if the collective word-to-image correlation between the candidate word w and the target image I_usatisfies a preset condition.
- View Dependent Claims (19, 20)
- - 19. The one or more computer readable memory devices as recited in claim 18, wherein:
    - for a candidate word w selected from the lexicon V, determining the value of the collective word-to-image correlation between the candidate word w and the target image I_uis performed for a plurality of candidate words, andannotating the target image I_ucomprises;
      
      comparing the values of the collective word-to-image correlations of the plurality of candidate words; and
      
      annotating the target image I_uwith at least one of the candidate words whose value of the collective word-to-image correlation is among the highest of the plurality of candidate words.
  - 20. The one or more computer readable memory devices as recited in claim 18, wherein obtaining the word-to-image correlation between each word and the target image I_ucomprises:
    - performing an image search on an image data source using the word v as a query word;
      
      selecting a plurality of images from results of the image search to form an image representation of the word v;
      
      determining an image-to-image correlation between the target image I_uand the image representation of the word v; and
      
      determining the word-to-image correlation between the word and the target image I_ubased on the image-to-image correlation between the target image I_uand the image representation of the word v.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Li, Mingjing, Wang, Bin, Li, Zhiwei, Ma, Wei-Ying, Lui, Jing
Primary Examiner(s)
Godbold, Douglas
Assistant Examiner(s)
Villena, Mark

Application Number

US11/956,331
Publication Number

US 20090076800A1
Time in Patent Office

2,147 Days
Field of Search

707/999.003
US Class Current

704/9
CPC Class Codes

G06F 40/169 Annotation, e.g. comment da...

G06F 40/242 Dictionaries

Dual cross-media relevance model for image annotation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

78 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Dual cross-media relevance model for image annotation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

78 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others