Enhanced max margin learning on multimodal data mining in a multimedia database
First Claim
1. A method comprising:
- representing each of a plurality of images in a database as information in an image space;
associating a label word set, from an annotation word space, with each of the plurality of images, to define a plurality of training instances, each respective training instance comprising a respective image and a respective associated label word set, and having at least one constraint;
computing a feature vector in a feature space for each of the plurality of images;
automatically clustering the respective feature vectors in the feature space into a plurality of clusters, grouping similar feature vectors together within a common cluster, and determining a visual representative for each of the plurality of clusters;
structuring the annotation word space, to produce a structured annotation word space, based on at least the clustering of the respective features in the feature space and an association of respective associated label word sets with respective images, using an at least one automated optimization processor configured to perform an enhanced max-margin learning optimization in a dual space, dependent on inner products in a joint feature space of the feature vectors and the structured annotation word space, to minimize a prediction error of associated label words of the annotation word space for the plurality of training instances;
storing information representing the structured annotation word space in a memory after the optimization; and
receiving a query comprising at least one of a query image and a query semantic expression, and producing, or identifying in response, a response comprising at least one of an response image and a response semantic expression, selectively dependent on the structured annotation word space in the memory after the optimization.
0 Assignments
0 Petitions
Accused Products
Abstract
Multimodal data mining in a multimedia database is addressed as a structured prediction problem, wherein mapping from input to the structured and interdependent output variables is learned. A system and method for multimodal data mining is provided, comprising defining a multimodal data set comprising image information; representing image information of a data object as a set of feature vectors in a feature space; clustering in the feature space to group similar features; associating a non-image representation with a respective image data object based on the clustering; determining a joint feature representation of a respective data object as a mathematical weighted combination of a set of components of the joint feature representation; optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification; and employing the mathematical weighted combination for automatically classifying a new data object.
769 Citations
20 Claims
-
1. A method comprising:
-
representing each of a plurality of images in a database as information in an image space; associating a label word set, from an annotation word space, with each of the plurality of images, to define a plurality of training instances, each respective training instance comprising a respective image and a respective associated label word set, and having at least one constraint; computing a feature vector in a feature space for each of the plurality of images; automatically clustering the respective feature vectors in the feature space into a plurality of clusters, grouping similar feature vectors together within a common cluster, and determining a visual representative for each of the plurality of clusters; structuring the annotation word space, to produce a structured annotation word space, based on at least the clustering of the respective features in the feature space and an association of respective associated label word sets with respective images, using an at least one automated optimization processor configured to perform an enhanced max-margin learning optimization in a dual space, dependent on inner products in a joint feature space of the feature vectors and the structured annotation word space, to minimize a prediction error of associated label words of the annotation word space for the plurality of training instances; storing information representing the structured annotation word space in a memory after the optimization; and receiving a query comprising at least one of a query image and a query semantic expression, and producing, or identifying in response, a response comprising at least one of an response image and a response semantic expression, selectively dependent on the structured annotation word space in the memory after the optimization. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method, comprising:
-
defining a multimodal data set comprising objects having image information and semantic labels of the image information in a semantic space, comprising a plurality of training instances, each training instance comprising an object and at least one associated semantic label, and having at least one constraint; representing the image information as a set of feature vectors in an image feature space by automatically processing the multimodal data set on at least one automated processor; automatically clustering the objects, with the at least one automated processor, based on the set of feature vectors in the image feature space, to group objects having similar image features together within common clusters; structuring the semantic space with at least one automated optimization processor, to produce a structured semantic space, based on at least the clustering of the objects in the image feature space, and an association of respective semantic labels with respective objects, configured to perform an enhanced max-margin learning optimization in a dual space, dependent on inner products in a joint feature space of the feature vectors and the structured semantic space, to minimize a prediction error for the training instances having the at least one constraint, storing information defining the structured semantic space in at least one memory after the optimization; and receiving a query comprising at least one of a query image and a query semantic expression, and producing or identifying in response, a response comprising at least one of an response image and a response semantic expression, selectively dependent on the structured semantic space in the at least one memory. - View Dependent Claims (17, 18, 19)
-
-
20. An apparatus, comprising:
-
a database comprising a plurality of images each representing information in an image space; a label word set, in an annotation word space, associated with each of the plurality of images; at least a portion of the plurality of images being training instances, each training instance comprising image information, a label word set, and at least one constraint; a feature vector in a feature space, for each of the plurality of images; at least one processor configured to; automatically cluster the feature space into a plurality of clusters, each respective cluster grouping similar feature vectors together within a common cluster; automatically structure the annotation word space, to produce a structured annotation word space, based on at least the clustering in the feature space and an association of respective label word sets with respective images, using an optimization algorithm comprising an enhanced max-margin learning optimization in a dual space, dependent on inner products in a joint feature space of the feature vectors and the structured annotation word space, to minimize a prediction error for the training instances having the at least one constraint; and receiving at least one of a query image and a query semantic expression, and producing in response, at least one of an response image and a response semantic expression, selectively dependent on the structured annotation word space; a memory configured to store information representing the structured annotation word space after the optimization; and an output configured to present the at least one of the response image and the response semantic expression dependent on the structured annotation word space in the memory.
-
Specification