System and method for image annotation and multi-modal image retrieval using probabilistic semantic models

US 7,814,040 B1
Filed: 01/24/2007
Issued: 10/12/2010
Est. Priority Date: 01/31/2006
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of retrieving media, comprising:

(a) creating a probabilistic framework relating media types within a mixed media work to implicit concepts;

(b) creating an index of a set of media based on implicit concepts within the probabilistic framework;

(c) receiving a query expressed in the form of a media exemplar;

(d) determining a set of concepts expressed in the media exemplar;

(e) searching the index of the set of media for elements representing similar implicit concepts to those expressed in the media exemplar; and

(f) outputting at least one representation or identifier of the elements representing similar implicit concepts to those expressed in the media exemplar;

wherein said outputting comprises outputting a representation or identifiers of a plurality of elements, further comprising;

ranking the plurality of elements based on at least a similarity of the respective element to implicit concepts expressed in the media exemplar;

wherein said determining comprises probabilistically determining a set of semantic concepts inherent in the media exemplar, based on correlations of features in respective multimedia works having predetermined semantic concepts associated therewith;

further comprising determining a concept vector for the media exemplar;

wherein the probabilistic framework comprises a Bayesian model for associating words with an image having visual features, comprising a hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images having the visual features and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and Methods for multi-modal or multimedia image retrieval are provided. Automatic image annotation is achieved based on a probabilistic semantic model in which visual features and textual words are connected via a hidden layer comprising the semantic concepts to be discovered, to explicitly exploit the synergy between the two modalities. The association of visual features and textual words is determined in a Bayesian framework to provide confidence of the association. A hidden concept layer which connects the visual feature(s) and the words is discovered by fitting a generative model to the training image and annotation words. An Expectation-Maximization (EM) based iterative learning procedure determines the conditional probabilities of the visual features and the textual words given a hidden concept class. Based on the discovered hidden concept layer and the corresponding conditional probabilities, the image annotation and the text-to-image retrieval are performed using the Bayesian framework.

Citations

15 Claims

1. A computer-implemented method of retrieving media, comprising:
- (a) creating a probabilistic framework relating media types within a mixed media work to implicit concepts;
  
  (b) creating an index of a set of media based on implicit concepts within the probabilistic framework;
  
  (c) receiving a query expressed in the form of a media exemplar;
  
  (d) determining a set of concepts expressed in the media exemplar;
  
  (e) searching the index of the set of media for elements representing similar implicit concepts to those expressed in the media exemplar; and
  
  (f) outputting at least one representation or identifier of the elements representing similar implicit concepts to those expressed in the media exemplar;
  
  wherein said outputting comprises outputting a representation or identifiers of a plurality of elements, further comprising;
  
  ranking the plurality of elements based on at least a similarity of the respective element to implicit concepts expressed in the media exemplar;
  
  wherein said determining comprises probabilistically determining a set of semantic concepts inherent in the media exemplar, based on correlations of features in respective multimedia works having predetermined semantic concepts associated therewith;
  
  further comprising determining a concept vector for the media exemplar;
  
  wherein the probabilistic framework comprises a Bayesian model for associating words with an image having visual features, comprising a hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images having the visual features and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method according to claim 1, wherein f₁,iε
    - [1,N] denotes a visual feature vector of images in a training database, where N is the size of the database, wⁱ,jε
      
      [1,M] denotes the distinct textual words in a training annotation word set, where M is the size of an annotation vocabulary in the training database, the visual features of images in the database, f_i=[f_i¹, f_i², . . . , f_i^L],iε
      
      [1,N] are known i.i.d. samples from an unknown distribution, having a visual feature dimension L, the specific visual feature annotation word pairs (f_i,w^j), iε
      
      [1,N],jε
      
      [1,M] are known i.i.d. samples from an unknown distribution, associated with an unobserved semantic concept variable zε
      
      Z={z₁. . . z_k}, in which each observation of one visual feature fε
      
      F={f_i, f₂, . . . , f_N} belongs to one or more concept classes z_kand each observation of one word wε
      
      V={w¹, w², . . . , w^M} in one image f belongs to one concept class, in which the observation pairs (f_i,w^j) are assumed to be generated independently, and the pairs of random variables (f_i,w^j) are assumed to be conditionally independent given the respective hidden concept z_k, such that
      P(f_i,w^j|z_k)=p_ℑ(f_i|z_k)P_V(w^j|z_k).
  - 3. The method according to claim 2, in which the visual feature and word distribution is treated as a randomized data generation process, wherein a probability of a concept is represented as P_z(z_k);
    - a visual feature is selected f_iε
      
      F with probability P_ℑ(f_i|z_k); and
      
      a textual word is selected w^jε
      
      V with probability P_V(w^j|z_k), from which an observed pair (f_i,w^j) is obtained, such that a joint probability model is expressed as follows;
  - 4. The method according to claim 3, in which word concept conditional probabilities P_V(●
    - |Z), i.e., P_V(w^j|z_k) for kε
      
      [1,K], are estimated through fitting the probabilistic model to the training set.
  - 5. The method according to claim 4, in which P_ℑ
    - (f_i|z_k) is determined by maximization of the log-likelihood function;
  - 6. The method according to claim 5, wherein the model is resolved by applying the expectation-maximization (EM) technique, comprising:
    - (i) an expectation (E) step where the posterior probabilities are computed for the hidden variable z_kbased on the current estimates of the parameters; and
      
      (ii) an maximization (M) step, where parameters are updated to maximize the expectation of the complete-data likelihood log P (F,V,Z) given the posterior probabilities computed in the preceding E-step, whereby the probabilities can be iteratively determined by fitting the model to the training image database and the associated annotations.
  - 7. The method according to claim 6, wherein Bayes'"'"' rule is applied to determine the posterior probability for z_kunder f_iand (f_i,w^j):
  - 8. The method according to claim 6, wherein the expectation of the likelihood log P(F,Z) for the estimated P(Z|F) may be expressed as:
  - 9. The method according to claim 8, in which the number of concepts, K, is chosen to maximize
  - 10. The method according to claim 9, in which
    m_K=(K−
    - 1)+K(M−
      
      1)+K(N−
      
      1)+L²=K(M+N−
      
      1)+L²−
      
      1.
  - 11. The method according to claim 10, wherein a joint distribution P(w^j,z_k,f_i) is used to model the probability of an event that a word w^jbelonging to semantic concept z_kis an annotation word of image f_i:
    - P(w^j,z_k,f_i)=P_z(z_k)p_ℑ(f_i|z_k)P_V(w^j|z_k)or
  - 12. The method according to claim 11, wherein a Monte Carlo integration is used to derive
  - 13. The method according to claim 10, wherein images are retrieved for word queries by determining the conditional probability P(f_i|w_j)

14. An apparatus adapted to retrieve media, comprising:
- (a) at least one memory, adapted for storing therein a probabilistic framework relating media types within a mixed media work to implicit concepts;
  
  (b) at least one memory, adapted for storing an index of a set of media based on implicit concepts within the probabilistic framework(c) an input, adapted to receive a query expressed in the form of a media exemplar;
  
  (d) at least one processor, adapted to;
  
  access the at least one memory adapted for storing a probabilistic frameworkaccess the at least one memory adapted for storing an index of a set of media,receive the query from the input,determine a set of concepts expressed in the media exemplar, andsearch the index of the set of media for elements representing similar implicit concepts to those expressed in the media exemplar; and
  
  (e) an output adapted to present at least one of the elements representing similar implicit concepts to those expressed in the media exemplar;
  
  wherein the probabilistic framework comprises a Bayesian model for associating words with an image having visual features, comprising a hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images having the visual features and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.

15. A non-transitory computer readable medium storing therein instructions for controlling a programmable processor to perform the steps of:
- (a) creating a probabilistic framework relating media types within a mixed media work to implicit concepts;
  
  (b) indexing a set of media based on implicit concepts within the probabilistic framework, and storing the index in a memory;
  
  (c) receiving a query expressed as a media exemplar;
  
  (d) determining a set of concepts expressed in the media exemplar;
  
  (e) searching the stored index of the set of media for elements representing similar implicit concepts to the media exemplar; and
  
  (f) outputting a representation or identifier of at least one of the elements representing similar implicit concepts to the media exemplar;
  
  wherein the probabilistic framework comprises a Bayesian model stored in a memory for associating words with an image having visual features, comprising a hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images having the visual features and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Research Foundation for The State University of New York (State University of New York)
Original Assignee
The Research Foundation for The State University of New York (State University of New York)
Inventors
Zhang, Zhongfei, Zhang, Ruofei
Primary Examiner(s)
Vincent, David R

Application Number

US11/626,835
Time in Patent Office

1,357 Days
Field of Search

706 45- 47, 706/55, 382/155, 382/181, 382/190
US Class Current

706/45
CPC Class Codes

G06F 16/5838   using colour

G06F 16/5846   using extracted text

G06F 18/24155   Bayesian classification

G06N 5/02   Knowledge representation; S...

G06N 7/01   Probabilistic graphical mod...

System and method for image annotation and multi-modal image retrieval using probabilistic semantic models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for image annotation and multi-modal image retrieval using probabilistic semantic models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links