System and method for multimedia ranking and multi-modal image retrieval using probabilistic semantic models and expectation-maximization (EM) learning

US 10,614,366 B1
Filed: 03/04/2016
Issued: 04/07/2020
Est. Priority Date: 01/31/2006
Status: Active Grant

First Claim

Patent Images

1. A method of extracting implicit concepts within a set of multimedia works, comprising:

(a) receiving a plurality of portions of the set of multimedia works, each portion comprising semantic features and non-semantic features;

(b) probabilistically determining, with at least one automated data processor, a set of semantic concepts inherent in the respective non-semantic features of the received portions, based on at least a Bayesian model, comprising a hidden concept layer formulated based on at least one joint probability distribution which models a probability that a respective semantic concept annotates a respective non-semantic feature that connects a semantic feature layer and a non-semantic feature layer, wherein the hidden concept layer is discovered by fitting a generative model to a training set comprising non-semantic features and annotation semantic features, the conditional probabilities of the non-semantic features and the annotation semantic features given a hidden concept class being determined based on an Expectation-Maximization (EM) based iterative learning procedure, the non-semantic features being generated from a plurality of respective Gaussian distributions, respectively corresponding to a semantic concept, each non-semantic feature having a conditional probability density function selectively dependent on a covariance matrix of non-semantic features belonging to the respective semantic concept;

(c) determining, with the at least one automated data processor, a semantic concept vector for a respective multimedia work, dependent on at least the determined semantic concepts inherent in the respective non-semantic features of the received portions; and

(d) at least one of storing and communicating information representing the determined semantic concept vector.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and Methods for multi-modal or multimedia image retrieval are provided. Automatic image annotation is achieved based on a probabilistic semantic model in which visual features and textual words are connected via a hidden layer comprising the semantic concepts to be discovered, to explicitly exploit the synergy between the two modalities. The association of visual features and textual words is determined in a Bayesian framework to provide confidence of the association. A hidden concept layer which connects the visual feature(s) and the words is discovered by fitting a generative model to the training image and annotation words. An Expectation-Maximization (EM) based iterative learning procedure determines the conditional probabilities of the visual features and the textual words given a hidden concept class. Based on the discovered hidden concept layer and the corresponding conditional probabilities, the image annotation and the text-to-image retrieval are performed using the Bayesian framework.

Citations

20 Claims

1. A method of extracting implicit concepts within a set of multimedia works, comprising:
- (a) receiving a plurality of portions of the set of multimedia works, each portion comprising semantic features and non-semantic features;
  
  (b) probabilistically determining, with at least one automated data processor, a set of semantic concepts inherent in the respective non-semantic features of the received portions, based on at least a Bayesian model, comprising a hidden concept layer formulated based on at least one joint probability distribution which models a probability that a respective semantic concept annotates a respective non-semantic feature that connects a semantic feature layer and a non-semantic feature layer, wherein the hidden concept layer is discovered by fitting a generative model to a training set comprising non-semantic features and annotation semantic features, the conditional probabilities of the non-semantic features and the annotation semantic features given a hidden concept class being determined based on an Expectation-Maximization (EM) based iterative learning procedure, the non-semantic features being generated from a plurality of respective Gaussian distributions, respectively corresponding to a semantic concept, each non-semantic feature having a conditional probability density function selectively dependent on a covariance matrix of non-semantic features belonging to the respective semantic concept;
  
  (c) determining, with the at least one automated data processor, a semantic concept vector for a respective multimedia work, dependent on at least the determined semantic concepts inherent in the respective non-semantic features of the received portions; and
  
  (d) at least one of storing and communicating information representing the determined semantic concept vector.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method according to claim 1, further comprising receiving a word as an input, and outputting at least one image or an identifier of at least one image corresponding to the word.
  - 3. The method according to claim 1, further comprising receiving an image as an input, and outputting at least one word or an identifier of at least one word corresponding to the image.
  - 4. The method according to claim 1, wherein said probabilistically determining comprises employing at least one conditional probability represented in the Bayesian model for associating words with an image, comprising a set of parameters stored in a memory representing the hidden concept layer which connects a non-semantic feature layer comprising a visual feature layer and a semantic feature layer comprising a word layer.
  - 5. The method according to claim 4, further comprising discovering the hidden concept layer by fitting the generative model to a training set comprising image and annotation words, wherein the conditional probabilities of the visual features and the annotation words given the hidden concept class are determined based on the Expectation-Maximization (EM) based iterative learning procedure.
  - 6. The method according to claim 5, wherein the Bayesian model comprises a semantic Bayesian framework representing an association of visual content with a plurality of semantic concepts, comprising at least one hidden layer formulated based on at least one joint probability distribution which models a probability that a word belonging to a respective semantic concept is an annotation word of respective visual content;
    - wherein a set of visual content is mapped to the semantic Bayesian framework dependent on semantic concepts represented in respective visual content, using at least one automated processor which automatically determines a set of annotation words associated with the respective visual content;
      
      at least one implicit semantic concept is automatically extracted from a received query seeking elements of the set of visual content corresponding to at least one implicit semantic concept, using at least one automated processor;
      
      elements of the mapped set of visual content corresponding to the at least one extracted implicit semantic concept are automatically determined, using at least one automated processor; and
      
      the corresponding visual content is ranked in accordance with at least a correspondence to the at least one extracted implicit semantic concept.
  - 7. The method according to claim 5, wherein the hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images and annotation words.
  - 8. The method according to claim 7, whereinf_i, i∈
    - [1, N] denotes a visual feature vector of images in a training database, where N is the size of the database,wⁱ, j∈
      
      [1, M] denotes the distinct textual words in a training annotation word set, where M is the size of annotation vocabulary in the training database, the visual features of images in the database,f_i=[f_i¹, f_i², . . . , f_i^L], i∈
      
      [1, N] are known i.i.d. samples from an unknown distribution, having a visual feature dimension L,the specific visual feature annotation word pairs (f_i, w^j), i∈
      
      [1, N], j∈
      
      [1, M] are known i.i.d. samples from an unknown distribution, associated with an unobserved semantic concept variable z∈
      
      Z={z₁, . . . z_k}, in which each observation of one visual feature f∈
      
      F={f_i, f₂, . . . , f_N} belongs to one or more concept classes z_kand each observation of one word w∈
      
      V+{w¹, w², . . . , w^M} in one image f_ibelongs to one concept class, in which the observation pairs or random variables (f_i, w^j) are both assumed to be both generated independently assumed to be conditionally independent given the respective hidden concept z_k, such that P(f_i,w^j|z_k)=p_ℑ(f_i|z_k)P_V(w^j|z_k);
      
      the visual feature and word distribution is treated as a randomized data generation process, wherein a probability of a concept is represented as P_z(z_k);
      
      a visual feature is selected f_i∈
      
      F with probability P_ℑ(f_i|z_k); and
      
      a textual word is selected w^j∈
      
      V with probability P_V(w^j|z_k), from which an observed pair (f_i,w^j) is obtained, such that a joint probability model is expressed as follows;
  - 9. The method according to claim 8, wherein Bayes'"'"' rule is applied to determine the posterior probability for z_kunder f_iand (f_i, w^j):
  - 10. The method according to claim 8,further comprising determining P_ℑ
    - (f_i|z_k) by maximizing the log-likelihood function;
  - 11. The method according to claim 10, wherein:
    - the expectation of the likelihood log P(F,Z) for the estimated P(Z|F) is expressed as;
  - 12. The method according to claim 11, wherein a joint distribution
    P(w^j,z_k,f_i)=P_z(Z_k)p_ℑ
    - (f_i|z_k)P_V(w^j|z_k)is used to model a probability of an event that a word w^jbelonging to semantic concept z_kis an annotation word of an image f_i, and applying Bayes law and the integrating over P_z(z_k), to obtain
  - 13. The method according to claim 12, wherein an approximation of the expectation is derived by utilizing a Monte Carlo sampling technique to derive
  - 14. The method according to claim 8, in which a number of concepts, K, is chosen to maximize
  - 15. The method according to claim 8, further comprising retrieving images for word queries by determining the conditional probability P(f_i|w_j)

16. A Bayesian model for associating words with an image, comprising a non-transitory medium storing parameters representing a hidden concept layer formulated based on at least one joint probability distribution which models a probability that a respective word annotates a respective visual feature that connects a semantic feature layer and a visual feature layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising image and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure, the visual features being generated from a plurality of respective Gaussian distributions, respectively corresponding to a semantic concept, each visual feature having a conditional probability density function selectively dependent on a covariance matrix of visual features belonging to the respective semantic concept.
- View Dependent Claims (17)
- - 17. The Bayesian model according to claim 16, further comprising at least one automated data processor configured to:
    - (a) receive a plurality of portions of a set of multimedia works, each portion comprising at least a semantic component comprising words and a non-semantic component comprising an image;
      
      (b) probabilistically determining a set of semantic concepts inherent in the respective non-semantic components of the received portions, based on correlations of features in respective multimedia works;
      
      (c) storing the set of semantic concepts within a memory as parameters of the Bayesian model; and
      
      (d) determining a semantic concept vector for a respective multimedia work, dependent on at least the determined semantic concepts inherent in the respective non-semantic components of the received portions.

18. A system for representing a correspondence of first records comprising information having a first information type with second records comprising information having a second information type, each of the first records and second records comprising semantic information, and at least one of the first and second types comprising non-semantic information, comprising:
- an input configured to receive first records and second records;
  
  at least one automated data processor configured to determine the correspondence;
  
  and a memory, configured to store information representing a Bayesian model comprising a hidden concept layer which connects a non-semantic information feature layer and a semantic information feature layer formulated based on at least one joint probability distribution which models a probability that respective semantic information annotates respective non-semantic information, wherein the hidden concept layer is discovered by fitting a generative model to a training set comprising non-textual information and annotation semantic information, wherein conditional probabilities of the non-textual information features and the annotation semantic labels given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure, and the non-semantic features are generated from a plurality of respective Gaussian distributions, each respectively corresponding to a semantic concept variable, and having a conditional probability density function selectively dependent on a covariance matrix of non-semantic features belonging to the respective semantic concept variable.
- View Dependent Claims (19, 20)
- - 19. The system according to claim 18, wherein the Bayesian model comprises a semantic Bayesian framework representing an association of non-semantic content with a plurality of semantic concepts, comprising at least one hidden layer formulated based on at least one joint probability distribution which models a probability that a semantic label belonging to a respective semantic concept is an annotation word of respective non-semantic content;
    - wherein a set of non-semantic content is mapped to the semantic Bayesian framework dependent on semantic concepts represented in respective non-semantic content, using the at least one automated data processor which automatically determines a set of annotation words associated with the respective non-semantic content;
      
      at least one implicit semantic concept is automatically extracted from a received query seeking elements of the set of non-semantic content corresponding to the at least one implicit semantic concept, using the at least one automated data processor;
      
      elements of the mapped set of non-semantic content corresponding to the at least one extracted implicit semantic concept are automatically determined, using the at least one automated data processor; and
      
      the corresponding non-semantic content is ranked in accordance with at least a correspondence to the at least one extracted implicit semantic concept.
  - 20. The system according to claim 18, wherein the at least one automated data processor is configured to quantitatively rank multimedia works with respect to a relevance to a query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Research Foundation For The State University O
Original Assignee
Research Foundation For The State University O
Inventors
Zhang, Ruofei, Zhang, Zhongfei
Primary Examiner(s)
Vincent, David R

Application Number

US15/061,641
Time in Patent Office

1,495 Days
Field of Search

706 15, 706 45
US Class Current
CPC Class Codes

G06F 16/5838   using colour

G06F 16/5846   using extracted text

G06F 18/24155   Bayesian classification

G06N 5/02   Knowledge representation; S...

G06N 7/01   Probabilistic graphical mod...

System and method for multimedia ranking and multi-modal image retrieval using probabilistic semantic models and expectation-maximization (EM) learning

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for multimedia ranking and multi-modal image retrieval using probabilistic semantic models and expectation-maximization (EM) learning

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links