System and method for image annotation and multi-modal image retrieval using probabilistic semantic models comprising at least one joint probability distribution

US 8,204,842 B1
Filed: 10/12/2010
Issued: 06/19/2012
Est. Priority Date: 01/31/2006
Status: Active Grant

First Claim

Patent Images

1. A method of retrieving media, comprising:

(a) defining a probabilistic framework organizing media with respect to concepts represented in the respective media comprising at least one hidden layer;

(b) automatically mapping elements of a set of media to the probabilistic framework, using at least one processor, based on implicit concepts represented within each element of the set of media, the probabilistic framework comprising at least one joint probability distribution which models a probability that a symbol belonging to a respective concept is an annotation symbol of respective media;

(c) automatically determining, using at least one processor, at least one implicit concept represented in a received query;

(d) automatically determining, using at least one processor, a probabilistic correspondence of elements of the set of media with the determined at least one implicit concept; and

(e) outputting at least one representation or identifier of at least one element of the set of media selectively in dependence on at least the determined probabilistic correspondence.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and Methods for multi-modal or multimedia image retrieval are provided. Automatic image annotation is achieved based on a probabilistic semantic model in which visual features and textual words are connected via a hidden layer comprising the semantic concepts to be discovered, to explicitly exploit the synergy between the two modalities. The association of visual features and textual words is determined in a Bayesian framework to provide confidence of the association. A hidden concept layer which connects the visual feature(s) and the words is discovered by fitting a generative model to the training image and annotation words. An Expectation-Maximization (EM) based iterative learning procedure determines the conditional probabilities of the visual features and the textual words given a hidden concept class. Based on the discovered hidden concept layer and the corresponding conditional probabilities, the image annotation and the text-to-image retrieval are performed using the Bayesian framework.

63 Citations

View as Search Results

20 Claims

1. A method of retrieving media, comprising:
- (a) defining a probabilistic framework organizing media with respect to concepts represented in the respective media comprising at least one hidden layer;
  
  (b) automatically mapping elements of a set of media to the probabilistic framework, using at least one processor, based on implicit concepts represented within each element of the set of media, the probabilistic framework comprising at least one joint probability distribution which models a probability that a symbol belonging to a respective concept is an annotation symbol of respective media;
  
  (c) automatically determining, using at least one processor, at least one implicit concept represented in a received query;
  
  (d) automatically determining, using at least one processor, a probabilistic correspondence of elements of the set of media with the determined at least one implicit concept; and
  
  (e) outputting at least one representation or identifier of at least one element of the set of media selectively in dependence on at least the determined probabilistic correspondence.
- View Dependent Claims (2, 3, 4, 5, 6, 19)
- - 2. The method according to claim 1, wherein the probabilistic framework is organized according to semantic concepts, and the query comprises media having implicit semantic concepts expressed in a non-semantic form.
  - 3. The method according to claim 1, wherein the query comprises an image, and wherein the image is processed to determine implicit semantic characteristics of an image content.
  - 4. The method according to claim 1, wherein at least one semantic word is associated with the received query as an implicit semantic concept, the probabilistic framework is a semantic probabilistic framework, and the at least one semantic word is used to search the mapped elements within the probabilistic semantic framework.
  - 5. The method according to claim 1, wherein the at least one representation or identifier of at least one element of the set of media is output in ranked order selectively in dependence on at least the determined probabilistic correspondence.
  - 6. The method according to claim 1, wherein the query comprises an image, and said automatically determining comprises associating words with the image using a Bayesian model comprising a hidden concept layer which connects a visual feature layer and a word layer, which is discovered by fitting a generative model to a training set comprising images and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.
  - 19. The method according to claim 1, wherein the symbol comprises a word and the and the annotation symbol comprises an annotation word, wherein a plurality of symbols belong to at least one respective concept, such that the association of the respective media with implicit concepts is through automatically generated annotation symbols and analysis of the joint probability distribution.

7. A method of ranking multimedia works with respect to a query, comprising:
- (a) defining a semantic Bayesian framework representing an association of a multimedia work content with a plurality of semantic concepts, comprising at least one hidden layer formulated based on at least one joint probability distribution which models a probability that a word belonging to a respective semantic concept is an annotation word of a respective multimedia work;
  
  (b) automatically mapping a set of multimedia works to the semantic Bayesian framework dependent on semantic concepts represented in respective multimedia works, using at least one processor which automatically determines a set of annotation words associated with the respective multimedia works;
  
  (c) automatically extracting at least one implicit semantic concept from a received query seeking elements of the set of multimedia works corresponding to at least one implicit semantic concept, using at least one processor;
  
  (d) automatically determining elements of the mapped set of multimedia works corresponding to the at least one extracted implicit semantic concept, using at least one processor; and
  
  (e) outputting the corresponding multimedia works ranked in accordance with at least a correspondence to the at least one extracted implicit semantic concept.
- View Dependent Claims (8, 9, 20)
- - 8. The method according to claim 7, wherein the multimedia works comprise annotated images, and the query comprises an image.
  - 9. The method according to claim 8, wherein the query comprises an image, and said automatically determining comprises associating words with the image using a Bayesian model comprising a hidden concept layer which connects a visual feature layer and a word layer, which is discovered by fitting a generative model to a training set comprising images and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.
  - 20. The method according to claim 7, wherein a plurality of word belong to at least one respective semantic concept, such that the association of the respective multimedia with the plurality of semantic concepts is through automatically generated annotation words and analysis of the joint probability distribution.

10. A method of extracting probable implicit semantic concepts of a multimedia work, comprising:
- (a) defining a probabilistic framework relating a correspondence of automatically derived non-semantic content features of each of a plurality of multimedia works with semantic content features of each respective multimedia work, comprising at least one hidden concept layer comprising at least one joint probability distribution which models a probability that a symbol belonging to a respective semantic concept is an appropriate annotation of a respective multimedia work;
  
  (b) receiving an input comprising at least non-semantic content of a multimedia work; and
  
  (c) automatically producing as an output by a processor, a semantic concept vector associated with the input, representing probable implicit semantic concepts represented by the input, based on at least a correspondence of the non-semantic content features of the input with the semantic concepts of the probabilistic framework.
- View Dependent Claims (11)
- - 11. The method according to claim 10, wherein the probabilistic framework comprises a Bayesian model for associating words with an image, wherein the hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.

12. A method of extracting probable implicit semantic concepts of a multimedia work, comprising:
- (a) defining a probabilistic framework relating a correspondence of non-semantic content features of each of a plurality of multimedia works with semantic content features of each respective multimedia work, comprising at least one hidden concept layer;
  
  (b) receiving an input comprising at least non-semantic content; and
  
  (c) automatically presenting as an output a semantic concept vector associated with the input, representing probable implicit semantic concepts represented by the input, based on at least a correspondence of the non-semantic content features of input with the probabilistic framework,wherein the probabilistic framework comprises a Bayesian model for associating words with an image, wherein the hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure, andwherein ƒ
  
  _i, iε
  
  [1,N] denotes a visual feature vector of images in a training database, where N is the size of the database. w^j, jε
  
  [1,M] denotes the distinct textual words in a training annotation word set, where M is the size of annotation vocabulary in the training database, the visual features of images in the database, ƒ
  
  _i=[ƒ
  
  _i¹, ƒ
  
  _i², . . . , ƒ
  
  _i^L], iε
  
  [1,N] are known i.i.d. samples from an unknown distribution, having a visual feature dimension L, the specific visual feature annotation word pairs (ƒ
  
  _i,w^j), iε
  
  [1,N], jε
  
  [1,M] are known i.i.d. samples from an unknown distribution, associated with an unobserved semantic concept variable zε
  
  Z={a₁, . . . z_k}, in which each observation of one visual feature ƒ
  
  ε
  
  F={ƒ
  
  _i, ƒ
  
  ₂, . . . , ƒ
  
  _N} belongs to one or more concept classes z_kand each observation of one word wε
  
  V={w¹, w², . . . , w^M} in one image ƒ
  
  _ibelongs to one concept class, in which the observation pairs (ƒ
  
  _i,w^j) are assumed to be generated independently, and the pairs of random variables (ƒ
  
  _i,w^j) are assumed to be conditionally independent given the respective hidden concept z_k, such that
  P(ƒ
  
  _i,w^j|z_k)=p_ℑ(ƒ
  
  _i|z_k)P_V(w^j|z_k);
  
  the visual feature and word distribution is treated as a randomized data generation process, wherein a probability of a concept is represented as P_z(z_k);
  
  a visual feature is selected ƒ
  
  _iε
  
  F with probability P_ℑ(ƒ
  
  _i|z_k); and
  
  a textual word is selected w^jε
  
  V with probability P_V(w^j|z_k), from which an observed pair (ƒ
  
  _i,w^j) is obtained, such that a joint probability model is expressed as follows;
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The method according to claim 12, in which P_ℑ
    - (ƒ
      
      _i|z_k) is determined by maximization of the log-likelihood function;
  - 14. The method according to claim 13, wherein Bayes'"'"' rule is applied to determine the posterior probability for z_kunder ƒ
    - _iand (ƒ
      
      _i,w^j);
  - 15. The method according to claim 13, wherein the expectation of the likelihood log P(F,Z) for the estimated P(Z|F) is expressed as:
  - 16. The method according to claim 15, in which the number of concepts, K, is choosen to maximize
  - 17. The method according to claim 16, wherein a joint distribution is used to model the probability of an event that a word w³belonging to semantic concept z_kis an annotation word of image ƒ
    - _i;
  - 18. The Bayesian model according to claim 13, wherein images are retrieved for word queries by determining the conditional probability P(ƒ
    - _i|w_j)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Research Foundation for The State University of New York (State University of New York)
Original Assignee
The Research Foundation for The State University of New York (State University of New York)
Inventors
Zhang, Ruofei, Zhang, Zhongfei
Primary Examiner(s)
VINCENT, DAVID ROBERT

Application Number

US12/903,099
Time in Patent Office

616 Days
Field of Search

706/12, 706/21, 706/45, 706/52, 382/155, 382/181, 382224-225, 382/190
US Class Current

706/45
CPC Class Codes

G06F 16/5838   using colour

G06F 16/5846   using extracted text

G06F 18/24155   Bayesian classification

G06N 5/02   Knowledge representation; S...

G06N 7/01   Probabilistic graphical mod...

System and method for image annotation and multi-modal image retrieval using probabilistic semantic models comprising at least one joint probability distribution

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

63 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for image annotation and multi-modal image retrieval using probabilistic semantic models comprising at least one joint probability distribution

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

63 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links