Content retrieval based on semantic association
First Claim
Patent Images
1. A multimedia system comprising:
- a query module generating a query in a plurality of media modalities;
a database including a plurality of matrices, each matrix corresponding to one of the media modalities, wherein each matrix builds a correlation between the corresponding media modality and another media modality;
an object detection module extracting a first plurality of object features from the query and a second plurality of object features from the database wherein the first plurality of object features and the second plurality of object features are extracted from media representing different modalities; and
a processor coupled to the object detection module, the processor being trained on the plurality of matrices of the database to maximize a bi-directional correlation of cross-modality media using sample data, the processor determining a correlation between the first plurality of object features and the second plurality of object features and to retrieve those items from the database which have a correlation at least equal to a predetermined maximum degree of correlation.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system which enable a user to query a multimedia archive in one media modality and automatically retrieve correlating data in another media modality without the need for manually associating the data items through a data structure. The correlation method finds the maximum correlation between the data items without being affected by the distribution of the data in the respective subspace of each modality. Once the direction of correlation is disclosed, extracted features can be transferred from one subspace to another.
-
Citations
21 Claims
-
1. A multimedia system comprising:
-
a query module generating a query in a plurality of media modalities; a database including a plurality of matrices, each matrix corresponding to one of the media modalities, wherein each matrix builds a correlation between the corresponding media modality and another media modality; an object detection module extracting a first plurality of object features from the query and a second plurality of object features from the database wherein the first plurality of object features and the second plurality of object features are extracted from media representing different modalities; and a processor coupled to the object detection module, the processor being trained on the plurality of matrices of the database to maximize a bi-directional correlation of cross-modality media using sample data, the processor determining a correlation between the first plurality of object features and the second plurality of object features and to retrieve those items from the database which have a correlation at least equal to a predetermined maximum degree of correlation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 21)
-
-
10. A multimedia system comprising:
-
a query module capable of generating a query in a plurality of media modalities; a database capable of storing data representing a plurality of media modalities; an object detection module capable of extracting a first plurality of object features from the query and a second plurality of object features from the database wherein the first plurality of object features and the second plurality of object features are extracted from media representing different modalities; and a processor coupled to the object detection modules, wherein the processor is arranged to determine a correlation between the first plurality of object features and the second plurality of object features and to retrieve those items from the database which have a correlation at least equal to a predetermined maximum degree of correlation, wherein prior to retrieval, the system is trained to correlate cross-modality media using sample data, wherein the training produces orthogonal matrices A=Cxx−
1/2U and B=Cyy−
1/2V wherein det(A)=det(B)=1 and Cxx=E{(X−
mx) (X−
mx)T}, Cyy=E{(Y−
my)(Y−
my)T}, Cxy=E{(X−
mx)(y−
my)T}, K=Cxx−
½
·
Cxy.Cyy−
1/2=U·
S·
VT and the correlation between AX representing a first feature set in a first modality and BY representing a second feature set in a second modality is greatest, thereby enabling a transfer of features from the first modality to the second modality, and wherein A and B are orthogonal matrices, X and Y are feature sets from different modalities, Cxx, Cyy, and Cxy are covariance matrices, mx and my are mean vectors, and U, S, and V are obtained from singular value decomposition. - View Dependent Claims (11)
-
-
12. A method of retrieving at least one item of interest to a user from a multimedia archive comprising the steps of:
-
generating a query in a plurality of media modalities; generating a plurality of matrices, each matrix corresponding to one of the media modalities, wherein each matrix builds a correlation between the corresponding media modality and another media modality; training the plurality of matrices to maximize a bi-directional correlation of cross-modality media using a sample data; extracting a first plurality of object features from the query, the object features representing a first modality; extracting a second plurality of object features from items in the multimedia archive, the object features representing a second modality, the archive including the plurality of matrices; determining a correlation between the first plurality of object features and the second plurality of object features using the plurality of matrices; retrieving those items from the archive which have object features having a correlation with the object features in the query at least equal to a predetermined maximum degree of correlation. - View Dependent Claims (13, 14, 15)
-
-
16. A method of retrieving at least one item of interest to a user from a multimedia archive comprising the steps of:
-
generating a query; extracting a first plurality of object features from the query, the object features representing a first modality; extracting a second plurality of object features from items in the multimedia archive, the object features representing a second modality; determining a correlation between the first plurality of object features and the second plurality of object features; and retrieving those items from the archive which have object features having a correlation with the object features in the query at least equal to a predetermined maximum degree of correlation using sample data to generate correlation matrices to correlate cross-modality media, wherein the matrices generated are represented by A=Cxx−
1/2U and B=Cyy−
1/2V and wherein det(A)=det(B)=1 and Cxx=E{(X−
mx)(X−
mx)T}, Cyy=E{(Y−
my)(Y−
my)T}, Cxy=E{(X−
mx)(Y−
my)T}, K=Cxx−
1/2·
Cxy·
Cyy−
1/2=U·
S·
VT and the correlation between AX representing a first feature set in a first modality and BY representing a second feature set in a second modality is greatest, thereby enabling a transfer of features from the first modality to the second modality, and wherein A and B are orthogonal matrices, X and Y are feature sets from different modalities, Cxx, Cyy, and Cxy are covariance matrices, mx and my are mean vectors, and U, S, and V are obtained from singular value decomposition. - View Dependent Claims (17)
-
-
18. Computer-executable process steps, the computer-executable process steps being stored on a computer-readable medium enabling a user to retrieve media of interest from a database of multimedia comprising:
-
a query generation step for obtaining a query from the user, the query being in a first media modality; a matrices generating step generating a plurality of matrices, each matrix corresponding to one of the media modalities, wherein each matrix builds a correlation between the corresponding media modality and another media modality; a training step training the plurality of matrices to maximize a bi-directional correlation of cross-modality media using a sample data; a first extracting step for extracting a first plurality of object features from the query; a second extracting step for extracting a second plurality of object features from items in the multimedia archive, the object features representing a second media modality, the archive including the plurality of matrices; a correlation calculation step for determining a correlation between the first plurality of object features and the second plurality of object features using the plurality of matrices; and a retrieval step for retrieving those items from the database which have object features having a correlation with the object features in the query at least equal to a predetermined maximum degree of correlation.
-
-
19. A system for retrieving at least one item of interest to a user from a multimedia archive comprising:
-
means for generating a query in a first media modality; means for generating a plurality of matrices, each matrix corresponding to one of the media modalities, wherein each matrix builds a correlation between the corresponding media modality and another media modality; means for training the plurality of matrices to maximize a bi-directional correlation of cross-modality media using a sample data; means for extracting a first plurality of object features from the query; means for extracting a second plurality of object features from items in the multimedia archive, the archive including the plurality of matrices; means for determining a correlation between the first plurality of object features and the second plurality of object features, the second plurality of object features being extracted from a second media modality using the plurality of matrices; and means for retrieving those items from the archive which have object features having a correlation with the object features in the query at least equal to a predetermined maximum degree of correlation.
-
-
20. A method for retrieving a query in a first media modality, when only a result of the query, in a second media modality, is initially known, comprising the steps of:
-
retrieving a stored matrix, B, for transforming features in the second modality into feature space that is correlated with the first modality, wherein the matrix B was produced during a training procedure to correlate items in the first modality A with items in the second modality B, and vice-versa, such that A=Cxx−
1/2U and B=Cyy−
1/2V wherein det(A)=det(B)=1 and Cxx=E{(X−
mx) (X−
mx)T}, Cyy=E{(Y−
my)(Y−
my)T}, Cxy=E{(X−
mx)(Y−
my)T}, K=Cxx−
½
·
Cxy·
Cyy−
1/2=U·
S·
VT and the correlation between AX representing a first feature set in the first modality and BY representing a second feature set in the second modality is greatest;extracting object features from items in the second modality; calculating AY for the second modality; extracting object features from items in the first modality, stored in a multimedia database; calculating AX for each of the items; correlating AX and AY; and retrieving the X having the greatest correlation between AX and BY, and wherein A and B are orthogonal matrices, X and Y are feature sets from different modalities, Cxx, Cyy, and Cxy are covariance matrices, mx and my are mean vectors, and U, S, and V are obtained from singular value decomposition.
-
Specification