System and method for image annotation and multi-modal image retrieval using probabilistic semantic models comprising at least one joint probability distribution
First Claim
1. A method of retrieving media, comprising:
- (a) defining a probabilistic framework organizing media with respect to concepts represented in the respective media comprising at least one hidden layer;
(b) automatically mapping elements of a set of media to the probabilistic framework, using at least one processor, based on implicit concepts represented within each element of the set of media, the probabilistic framework comprising at least one joint probability distribution which models a probability that a symbol belonging to a respective concept is an annotation symbol of respective media;
(c) automatically determining, using at least one processor, at least one implicit concept represented in a received query;
(d) automatically determining, using at least one processor, a probabilistic correspondence of elements of the set of media with the determined at least one implicit concept; and
(e) outputting at least one representation or identifier of at least one element of the set of media selectively in dependence on at least the determined probabilistic correspondence.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and Methods for multi-modal or multimedia image retrieval are provided. Automatic image annotation is achieved based on a probabilistic semantic model in which visual features and textual words are connected via a hidden layer comprising the semantic concepts to be discovered, to explicitly exploit the synergy between the two modalities. The association of visual features and textual words is determined in a Bayesian framework to provide confidence of the association. A hidden concept layer which connects the visual feature(s) and the words is discovered by fitting a generative model to the training image and annotation words. An Expectation-Maximization (EM) based iterative learning procedure determines the conditional probabilities of the visual features and the textual words given a hidden concept class. Based on the discovered hidden concept layer and the corresponding conditional probabilities, the image annotation and the text-to-image retrieval are performed using the Bayesian framework.
63 Citations
20 Claims
-
1. A method of retrieving media, comprising:
-
(a) defining a probabilistic framework organizing media with respect to concepts represented in the respective media comprising at least one hidden layer; (b) automatically mapping elements of a set of media to the probabilistic framework, using at least one processor, based on implicit concepts represented within each element of the set of media, the probabilistic framework comprising at least one joint probability distribution which models a probability that a symbol belonging to a respective concept is an annotation symbol of respective media; (c) automatically determining, using at least one processor, at least one implicit concept represented in a received query; (d) automatically determining, using at least one processor, a probabilistic correspondence of elements of the set of media with the determined at least one implicit concept; and (e) outputting at least one representation or identifier of at least one element of the set of media selectively in dependence on at least the determined probabilistic correspondence. - View Dependent Claims (2, 3, 4, 5, 6, 19)
-
-
7. A method of ranking multimedia works with respect to a query, comprising:
-
(a) defining a semantic Bayesian framework representing an association of a multimedia work content with a plurality of semantic concepts, comprising at least one hidden layer formulated based on at least one joint probability distribution which models a probability that a word belonging to a respective semantic concept is an annotation word of a respective multimedia work; (b) automatically mapping a set of multimedia works to the semantic Bayesian framework dependent on semantic concepts represented in respective multimedia works, using at least one processor which automatically determines a set of annotation words associated with the respective multimedia works; (c) automatically extracting at least one implicit semantic concept from a received query seeking elements of the set of multimedia works corresponding to at least one implicit semantic concept, using at least one processor; (d) automatically determining elements of the mapped set of multimedia works corresponding to the at least one extracted implicit semantic concept, using at least one processor; and (e) outputting the corresponding multimedia works ranked in accordance with at least a correspondence to the at least one extracted implicit semantic concept. - View Dependent Claims (8, 9, 20)
-
-
10. A method of extracting probable implicit semantic concepts of a multimedia work, comprising:
-
(a) defining a probabilistic framework relating a correspondence of automatically derived non-semantic content features of each of a plurality of multimedia works with semantic content features of each respective multimedia work, comprising at least one hidden concept layer comprising at least one joint probability distribution which models a probability that a symbol belonging to a respective semantic concept is an appropriate annotation of a respective multimedia work; (b) receiving an input comprising at least non-semantic content of a multimedia work; and (c) automatically producing as an output by a processor, a semantic concept vector associated with the input, representing probable implicit semantic concepts represented by the input, based on at least a correspondence of the non-semantic content features of the input with the semantic concepts of the probabilistic framework. - View Dependent Claims (11)
-
-
12. A method of extracting probable implicit semantic concepts of a multimedia work, comprising:
-
(a) defining a probabilistic framework relating a correspondence of non-semantic content features of each of a plurality of multimedia works with semantic content features of each respective multimedia work, comprising at least one hidden concept layer; (b) receiving an input comprising at least non-semantic content; and (c) automatically presenting as an output a semantic concept vector associated with the input, representing probable implicit semantic concepts represented by the input, based on at least a correspondence of the non-semantic content features of input with the probabilistic framework, wherein the probabilistic framework comprises a Bayesian model for associating words with an image, wherein the hidden concept layer which connects a visual feature layer and a word layer which is discovered by fitting a generative model to a training set comprising images and annotation words, wherein the conditional probabilities of the visual features and the annotation words given a hidden concept class are determined based on an Expectation-Maximization (EM) based iterative learning procedure, and wherein ƒ
i, iε
[1,N] denotes a visual feature vector of images in a training database, where N is the size of the database. wj, jε
[1,M] denotes the distinct textual words in a training annotation word set, where M is the size of annotation vocabulary in the training database, the visual features of images in the database, ƒ
i=[ƒ
i1, ƒ
i2, . . . , ƒ
iL], iε
[1,N] are known i.i.d. samples from an unknown distribution, having a visual feature dimension L, the specific visual feature annotation word pairs (ƒ
i,wj), iε
[1,N], jε
[1,M] are known i.i.d. samples from an unknown distribution, associated with an unobserved semantic concept variable zε
Z={a1, . . . zk}, in which each observation of one visual feature ƒ
ε
F={ƒ
i, ƒ
2, . . . , ƒ
N} belongs to one or more concept classes zk and each observation of one word wε
V={w1, w2, . . . , wM} in one image ƒ
i belongs to one concept class, in which the observation pairs (ƒ
i,wj) are assumed to be generated independently, and the pairs of random variables (ƒ
i,wj) are assumed to be conditionally independent given the respective hidden concept zk, such that
P(ƒ
i,wj|zk)=pℑ
(ƒ
i|zk)PV(wj|zk);the visual feature and word distribution is treated as a randomized data generation process, wherein a probability of a concept is represented as Pz(zk);
a visual feature is selected ƒ
iε
F with probability Pℑ
(ƒ
i|zk); and
a textual word is selected wjε
V with probability PV(wj|zk), from which an observed pair (ƒ
i,wj) is obtained, such that a joint probability model is expressed as follows; - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
Specification