System and method for image annotation and multi-modal image retrieval using probabilistic semantic models
First Claim
1. A computer-implemented method of retrieving media, comprising:
- (a) creating a probabilistic framework relating media types within a mixed media work to implicit concepts;
(b) creating an index of a set of media based on implicit concepts within the probabilistic framework;
(c) receiving a query expressed in the form of a media exemplar;
(d) determining a set of concepts expressed in the media exemplar;
(e) searching the index of the set of media for elements representing similar implicit concepts to those expressed in the media exemplar; and
(f) outputting at least one representation or identifier of the elements representing similar implicit concepts to those expressed in the media exemplar;
wherein said outputting comprises outputting a representation or identifiers of a plurality of elements, further comprising;
ranking the plurality of elements based on at least a similarity of the respective element to implicit concepts expressed in the media exemplar;
wherein said determining comprises probabilistically determining a set of semantic concepts inherent in the media exemplar, based on correlations of features in respective multimedia works having predetermined semantic concepts associated therewith;
further comprising determining a concept vector for the media exemplar;
wherein the probabilistic framework comprises a Bayesian model for associating words with an image having visual features, comprising a hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images having the visual features and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and Methods for multi-modal or multimedia image retrieval are provided. Automatic image annotation is achieved based on a probabilistic semantic model in which visual features and textual words are connected via a hidden layer comprising the semantic concepts to be discovered, to explicitly exploit the synergy between the two modalities. The association of visual features and textual words is determined in a Bayesian framework to provide confidence of the association. A hidden concept layer which connects the visual feature(s) and the words is discovered by fitting a generative model to the training image and annotation words. An Expectation-Maximization (EM) based iterative learning procedure determines the conditional probabilities of the visual features and the textual words given a hidden concept class. Based on the discovered hidden concept layer and the corresponding conditional probabilities, the image annotation and the text-to-image retrieval are performed using the Bayesian framework.
-
Citations
15 Claims
-
1. A computer-implemented method of retrieving media, comprising:
-
(a) creating a probabilistic framework relating media types within a mixed media work to implicit concepts; (b) creating an index of a set of media based on implicit concepts within the probabilistic framework; (c) receiving a query expressed in the form of a media exemplar; (d) determining a set of concepts expressed in the media exemplar; (e) searching the index of the set of media for elements representing similar implicit concepts to those expressed in the media exemplar; and (f) outputting at least one representation or identifier of the elements representing similar implicit concepts to those expressed in the media exemplar; wherein said outputting comprises outputting a representation or identifiers of a plurality of elements, further comprising;
ranking the plurality of elements based on at least a similarity of the respective element to implicit concepts expressed in the media exemplar;wherein said determining comprises probabilistically determining a set of semantic concepts inherent in the media exemplar, based on correlations of features in respective multimedia works having predetermined semantic concepts associated therewith; further comprising determining a concept vector for the media exemplar; wherein the probabilistic framework comprises a Bayesian model for associating words with an image having visual features, comprising a hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images having the visual features and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An apparatus adapted to retrieve media, comprising:
-
(a) at least one memory, adapted for storing therein a probabilistic framework relating media types within a mixed media work to implicit concepts; (b) at least one memory, adapted for storing an index of a set of media based on implicit concepts within the probabilistic framework (c) an input, adapted to receive a query expressed in the form of a media exemplar; (d) at least one processor, adapted to; access the at least one memory adapted for storing a probabilistic framework access the at least one memory adapted for storing an index of a set of media, receive the query from the input, determine a set of concepts expressed in the media exemplar, and search the index of the set of media for elements representing similar implicit concepts to those expressed in the media exemplar; and (e) an output adapted to present at least one of the elements representing similar implicit concepts to those expressed in the media exemplar; wherein the probabilistic framework comprises a Bayesian model for associating words with an image having visual features, comprising a hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images having the visual features and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.
-
-
15. A non-transitory computer readable medium storing therein instructions for controlling a programmable processor to perform the steps of:
-
(a) creating a probabilistic framework relating media types within a mixed media work to implicit concepts; (b) indexing a set of media based on implicit concepts within the probabilistic framework, and storing the index in a memory; (c) receiving a query expressed as a media exemplar; (d) determining a set of concepts expressed in the media exemplar; (e) searching the stored index of the set of media for elements representing similar implicit concepts to the media exemplar; and (f) outputting a representation or identifier of at least one of the elements representing similar implicit concepts to the media exemplar; wherein the probabilistic framework comprises a Bayesian model stored in a memory for associating words with an image having visual features, comprising a hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images having the visual features and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure.
-
Specification