ENHANCED MAX MARGIN LEARNING ON MULTIMODAL DATA MINING IN A MULTIMEDIA DATABASE
First Claim
1. A method comprising:
- representing each of a plurality of images in a database as information in an image space;
associating an annotation word set, from a structured annotation word space, with each of the plurality of images;
computing a feature vector in a feature space for each of the plurality of images based on at least the associated annotation word set from the structured annotation word space;
automatically clustering the feature space into a plurality of clusters, grouping similar feature vectors together within a common cluster; and
determining a visual representative for each of the plurality of clusters.
0 Assignments
0 Petitions
Accused Products
Abstract
Multimodal data mining in a multimedia database is addressed as a structured prediction problem, wherein mapping from input to the structured and interdependent output variables is learned. A system and method for multimodal data mining is provided, comprising defining a multimodal data set comprising image information; representing image information of a data object as a set of feature vectors in a feature space; clustering in the feature space to group similar features; associating a non-image representation with a respective image data object based on the clustering; determining a joint feature representation of a respective data object as a mathematical weighted combination of a set of components of the joint feature representation; optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification; and employing the mathematical weighted combination for automatically classifying a new data object.
-
Citations
20 Claims
-
1. A method comprising:
-
representing each of a plurality of images in a database as information in an image space; associating an annotation word set, from a structured annotation word space, with each of the plurality of images; computing a feature vector in a feature space for each of the plurality of images based on at least the associated annotation word set from the structured annotation word space; automatically clustering the feature space into a plurality of clusters, grouping similar feature vectors together within a common cluster; and determining a visual representative for each of the plurality of clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method, comprising:
-
defining a multimodal data set comprising objects having image information and semantic annotations of the image information in a structured semantic space; representing the image information as a set of feature vectors in a feature space; clustering the objects based on the feature vectors in the feature space, to group objects having similar features together; and determining representative image information for each cluster. - View Dependent Claims (17)
-
- 18. The method according to claim 18, further comprising classifying a new object in the joint feature space.
-
20. An apparatus, comprising:
-
a database comprising a plurality of images each representing information in an image space; an annotation word set, from a structured annotation word space, associated with each of the plurality of images; and a feature vector in a feature space, for each of the plurality of images, defined based on at least the associated annotation word set from the structured annotation word space; at least one processor configured to; automatically cluster the feature space into a plurality of clusters, each respective cluster grouping similar feature vectors together within a common cluster; and determine a visual representative for each of the plurality of clusters.
-
Specification