Enhanced max margin learning on multimodal data mining in a multimedia database

US 10,007,679 B2
Filed: 12/29/2014
Issued: 06/26/2018
Est. Priority Date: 08/08/2008
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

representing each of a plurality of images in a database as information in an image space;

associating a label word set, from an annotation word space, with each of the plurality of images, to define a plurality of training instances, each respective training instance comprising a respective image and a respective associated label word set, and having at least one constraint;

computing a feature vector in a feature space for each of the plurality of images;

automatically clustering the respective feature vectors in the feature space into a plurality of clusters, grouping similar feature vectors together within a common cluster, and determining a visual representative for each of the plurality of clusters;

structuring the annotation word space, to produce a structured annotation word space, based on at least the clustering of the respective features in the feature space and an association of respective associated label word sets with respective images, using an at least one automated optimization processor configured to perform an enhanced max-margin learning optimization in a dual space, dependent on inner products in a joint feature space of the feature vectors and the structured annotation word space, to minimize a prediction error of associated label words of the annotation word space for the plurality of training instances;

storing information representing the structured annotation word space in a memory after the optimization; and

receiving a query comprising at least one of a query image and a query semantic expression, and producing, or identifying in response, a response comprising at least one of an response image and a response semantic expression, selectively dependent on the structured annotation word space in the memory after the optimization.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Multimodal data mining in a multimedia database is addressed as a structured prediction problem, wherein mapping from input to the structured and interdependent output variables is learned. A system and method for multimodal data mining is provided, comprising defining a multimodal data set comprising image information; representing image information of a data object as a set of feature vectors in a feature space; clustering in the feature space to group similar features; associating a non-image representation with a respective image data object based on the clustering; determining a joint feature representation of a respective data object as a mathematical weighted combination of a set of components of the joint feature representation; optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification; and employing the mathematical weighted combination for automatically classifying a new data object.

769 Citations

20 Claims

1. A method comprising:
- representing each of a plurality of images in a database as information in an image space;
  
  associating a label word set, from an annotation word space, with each of the plurality of images, to define a plurality of training instances, each respective training instance comprising a respective image and a respective associated label word set, and having at least one constraint;
  
  computing a feature vector in a feature space for each of the plurality of images;
  
  automatically clustering the respective feature vectors in the feature space into a plurality of clusters, grouping similar feature vectors together within a common cluster, and determining a visual representative for each of the plurality of clusters;
  
  structuring the annotation word space, to produce a structured annotation word space, based on at least the clustering of the respective features in the feature space and an association of respective associated label word sets with respective images, using an at least one automated optimization processor configured to perform an enhanced max-margin learning optimization in a dual space, dependent on inner products in a joint feature space of the feature vectors and the structured annotation word space, to minimize a prediction error of associated label words of the annotation word space for the plurality of training instances;
  
  storing information representing the structured annotation word space in a memory after the optimization; and
  
  receiving a query comprising at least one of a query image and a query semantic expression, and producing, or identifying in response, a response comprising at least one of an response image and a response semantic expression, selectively dependent on the structured annotation word space in the memory after the optimization.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method according to claim 1, wherein the visual representative is a centroid of a respective cluster in the image space, further comprising determining a representative annotation word set for the determined visual representative for at least one respective cluster of the plurality of clusters.
  - 3. The method according to claim 1, wherein the visual representative is a selected image of the plurality of images, further comprising selecting representative annotation words for a respective cluster dependent on the selected image.
  - 4. The method according to claim 1, wherein each respective image of the plurality of images has a plurality of image blocks, each image block being associated with a respective image block label word set and a respective image block feature vector.
  - 5. The method according to claim 1, wherein structured annotation word space comprises interdependent annotation words.
  - 6. The method according to claim 1, further comprising optimizing the joint feature space by selecting weighting coefficients for a linear combination of a plurality of joint feature mapping vectors representing a relationship between each of the plurality of images and the associated label word set, based on at least one optimization criterion.
  - 7. The method according to claim 1, wherein the query comprises a query semantic expression word from the annotation word space image, and the response selectively dependent on the structured annotation word space, comprises a response image.
  - 8. The method according to claim 1, further comprising receiving an query image and automatically outputting the query semantic expression comprising a set of annotation words from the structured annotation word space describing the query image.
  - 9. The method according to claim 8, further comprising:
    - partitioning the query image into a plurality of blocks;
      
      computing a feature vector in the feature space for each of the plurality of blocks;
      
      computing a similarity of each feature vector with a plurality of visual representatives;
      
      selecting a set of most relevant visual representatives based on at least the computed similarities;
      
      determining a score relating a list of annotation words from the structured annotation word space and each of the selected most relevant visual representatives;
      
      merging and ranking the list of annotation words according to the respective determined score for each annotation word;
      
      defining the output set of annotation words as a subset of the list of annotation words which represent the highest ranked annotation words.
  - 10. The method according to claim 9, wherein said determining a score relating a list of annotation words from the structured annotation word space and each of the selected most relevant visual representatives is performed prior to receiving the input image.
  - 11. The method according to claim 1, further comprising receiving the query image, and selecting at least one image related to the query image based on a relation of the query image to the selected at least one image in the joint feature space.
  - 12. The method according to claim 1, further comprising:
    - determining a score representing a quantitative relation between each visual representative and the query;
      
      selecting a subset of the visual representatives most relevant to the query dependent on the determined score;
      
      computing a similarity score between the selected subset of visual representatives and a plurality of images in the database;
      
      merging and sorting the plurality of images in the database based on at least the computed similarity scores; and
      
      determining a most relevant subset of the plurality of images based on merged and sorted plurality of images in the database based on at least the computed similarity scores,wherein the the most relevant subset comprises the response image.
  - 13. The method according to claim 12, wherein said determining a score representing the quantitative relation between each visual representative and the query is performed prior to receiving the query.
  - 14. The method according to claim 1, further comprising:
    - receiving a query image;
      
      determining at least one visual representative image for the query image;
      
      determining a set of annotation words for each of the at least one visual representative image based on at least the structured annotation word space;
      
      determining a subset of the plurality of images in the database which have a highest relevance to the determined set of annotation words for the at least one visual representative image; and
      
      outputting the determined a subset of the plurality of images in the database as the response image.
  - 15. The method according to claim 1, further comprising:
    - determining at least one selected visual representative image corresponding to the query;
      
      determining a set of annotation words for each of the at least one selected visual representative image, dependent on at least the structured annotation word space;
      
      determining a subset of the plurality of images in the database which have a highest relevance to the determined set of annotation words for the determined at least one selected visual representative image;
      
      outputting the determined subset of the plurality of images in the database as the response image.

16. A method, comprising:
- defining a multimodal data set comprising objects having image information and semantic labels of the image information in a semantic space, comprising a plurality of training instances, each training instance comprising an object and at least one associated semantic label, and having at least one constraint;
  
  representing the image information as a set of feature vectors in an image feature space by automatically processing the multimodal data set on at least one automated processor;
  
  automatically clustering the objects, with the at least one automated processor, based on the set of feature vectors in the image feature space, to group objects having similar image features together within common clusters;
  
  structuring the semantic space with at least one automated optimization processor, to produce a structured semantic space, based on at least the clustering of the objects in the image feature space, and an association of respective semantic labels with respective objects, configured to perform an enhanced max-margin learning optimization in a dual space, dependent on inner products in a joint feature space of the feature vectors and the structured semantic space, to minimize a prediction error for the training instances having the at least one constraint,storing information defining the structured semantic space in at least one memory after the optimization; and
  
  receiving a query comprising at least one of a query image and a query semantic expression, and producing or identifying in response, a response comprising at least one of an response image and a response semantic expression, selectively dependent on the structured semantic space in the at least one memory.
- View Dependent Claims (17, 18, 19)
- - 17. The method according to claim 16, further comprising:
    - determining representative image information for each respective cluster;
      
      determining representative semantic annotations in the semantic space for the representative image information for each respective cluster; and
      
      using the representative semantic annotations to retrieve at least one object from an automated database storing the multimodal data set.
  - 18. The method according to claim 16, further comprising classifying the query based on at least a relation of a feature vector representing the query image to the structured semantic space.
  - 19. The method according to claim 18, further comprising automatically annotating an object represented in the query with semantic annotations based on the structured annotation semantic space.

20. An apparatus, comprising:
- a database comprising a plurality of images each representing information in an image space;
  
  a label word set, in an annotation word space, associated with each of the plurality of images;
  
  at least a portion of the plurality of images being training instances, each training instance comprising image information, a label word set, and at least one constraint;
  
  a feature vector in a feature space, for each of the plurality of images;
  
  at least one processor configured to;
  
  automatically cluster the feature space into a plurality of clusters, each respective cluster grouping similar feature vectors together within a common cluster;
  
  automatically structure the annotation word space, to produce a structured annotation word space, based on at least the clustering in the feature space and an association of respective label word sets with respective images, using an optimization algorithm comprising an enhanced max-margin learning optimization in a dual space, dependent on inner products in a joint feature space of the feature vectors and the structured annotation word space, to minimize a prediction error for the training instances having the at least one constraint; and
  
  receiving at least one of a query image and a query semantic expression, and producing in response, at least one of an response image and a response semantic expression, selectively dependent on the structured annotation word space;
  
  a memory configured to store information representing the structured annotation word space after the optimization; and
  
  an output configured to present the at least one of the response image and the response semantic expression dependent on the structured annotation word space in the memory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Research Foundation for The State University of New York (State University of New York)
Original Assignee
The Research Foundation for The State University of New York (State University of New York)
Inventors
Guo, Zhen, Zhang, Zhongfei
Primary Examiner(s)
Moyer, Andrew
Assistant Examiner(s)
Rosario, Dennis

Application Number

US14/583,893
Publication Number

US 20150186423A1
Time in Patent Office

1,275 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/00   Information retrieval; Data...

G06F 16/40   of multimedia data, e.g. sl...

G06F 16/45   Clustering; Classification

G06F 16/5838   using colour

G06F 17/10   Complex mathematical operat...

G06F 18/00   Pattern recognition

G06F 18/23   Clustering techniques

G06F 18/2411   based on the proximity to a...

G06F 18/253   of extracted features

G06V 10/764   using classification, e.g. ...

Enhanced max margin learning on multimodal data mining in a multimedia database

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

769 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Enhanced max margin learning on multimodal data mining in a multimedia database

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

769 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links