System and method for multimedia ranking and multi-modal image retrieval using probabilistic semantic models and expectation-maximization (EM) learning
First Claim
1. A method of extracting implicit concepts within a set of multimedia works, comprising:
- (a) receiving a plurality of portions of the set of multimedia works, each portion comprising semantic features and non-semantic features;
(b) probabilistically determining, with at least one automated data processor, a set of semantic concepts inherent in the respective non-semantic features of the received portions, based on at least a Bayesian model, comprising a hidden concept layer formulated based on at least one joint probability distribution which models a probability that a respective semantic concept annotates a respective non-semantic feature that connects a semantic feature layer and a non-semantic feature layer, wherein the hidden concept layer is discovered by fitting a generative model to a training set comprising non-semantic features and annotation semantic features, the conditional probabilities of the non-semantic features and the annotation semantic features given a hidden concept class being determined based on an Expectation-Maximization (EM) based iterative learning procedure, the non-semantic features being generated from a plurality of respective Gaussian distributions, respectively corresponding to a semantic concept, each non-semantic feature having a conditional probability density function selectively dependent on a covariance matrix of non-semantic features belonging to the respective semantic concept;
(c) determining, with the at least one automated data processor, a semantic concept vector for a respective multimedia work, dependent on at least the determined semantic concepts inherent in the respective non-semantic features of the received portions; and
(d) at least one of storing and communicating information representing the determined semantic concept vector.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and Methods for multi-modal or multimedia image retrieval are provided. Automatic image annotation is achieved based on a probabilistic semantic model in which visual features and textual words are connected via a hidden layer comprising the semantic concepts to be discovered, to explicitly exploit the synergy between the two modalities. The association of visual features and textual words is determined in a Bayesian framework to provide confidence of the association. A hidden concept layer which connects the visual feature(s) and the words is discovered by fitting a generative model to the training image and annotation words. An Expectation-Maximization (EM) based iterative learning procedure determines the conditional probabilities of the visual features and the textual words given a hidden concept class. Based on the discovered hidden concept layer and the corresponding conditional probabilities, the image annotation and the text-to-image retrieval are performed using the Bayesian framework.
-
Citations
20 Claims
-
1. A method of extracting implicit concepts within a set of multimedia works, comprising:
-
(a) receiving a plurality of portions of the set of multimedia works, each portion comprising semantic features and non-semantic features; (b) probabilistically determining, with at least one automated data processor, a set of semantic concepts inherent in the respective non-semantic features of the received portions, based on at least a Bayesian model, comprising a hidden concept layer formulated based on at least one joint probability distribution which models a probability that a respective semantic concept annotates a respective non-semantic feature that connects a semantic feature layer and a non-semantic feature layer, wherein the hidden concept layer is discovered by fitting a generative model to a training set comprising non-semantic features and annotation semantic features, the conditional probabilities of the non-semantic features and the annotation semantic features given a hidden concept class being determined based on an Expectation-Maximization (EM) based iterative learning procedure, the non-semantic features being generated from a plurality of respective Gaussian distributions, respectively corresponding to a semantic concept, each non-semantic feature having a conditional probability density function selectively dependent on a covariance matrix of non-semantic features belonging to the respective semantic concept; (c) determining, with the at least one automated data processor, a semantic concept vector for a respective multimedia work, dependent on at least the determined semantic concepts inherent in the respective non-semantic features of the received portions; and (d) at least one of storing and communicating information representing the determined semantic concept vector. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
- 16. A Bayesian model for associating words with an image, comprising a non-transitory medium storing parameters representing a hidden concept layer formulated based on at least one joint probability distribution which models a probability that a respective word annotates a respective visual feature that connects a semantic feature layer and a visual feature layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising image and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure, the visual features being generated from a plurality of respective Gaussian distributions, respectively corresponding to a semantic concept, each visual feature having a conditional probability density function selectively dependent on a covariance matrix of visual features belonging to the respective semantic concept.
-
18. A system for representing a correspondence of first records comprising information having a first information type with second records comprising information having a second information type, each of the first records and second records comprising semantic information, and at least one of the first and second types comprising non-semantic information, comprising:
-
an input configured to receive first records and second records; at least one automated data processor configured to determine the correspondence; and a memory, configured to store information representing a Bayesian model comprising a hidden concept layer which connects a non-semantic information feature layer and a semantic information feature layer formulated based on at least one joint probability distribution which models a probability that respective semantic information annotates respective non-semantic information, wherein the hidden concept layer is discovered by fitting a generative model to a training set comprising non-textual information and annotation semantic information, wherein conditional probabilities of the non-textual information features and the annotation semantic labels given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure, and the non-semantic features are generated from a plurality of respective Gaussian distributions, each respectively corresponding to a semantic concept variable, and having a conditional probability density function selectively dependent on a covariance matrix of non-semantic features belonging to the respective semantic concept variable. - View Dependent Claims (19, 20)
-
Specification