ENHANCED MAX MARGIN LEARNING ON MULTIMODAL DATA MINING IN A MULTIMEDIA DATABASE

US 20150186423A1
Filed: 12/29/2014
Published: 07/02/2015
Est. Priority Date: 08/08/2008
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

representing each of a plurality of images in a database as information in an image space;

associating an annotation word set, from a structured annotation word space, with each of the plurality of images;

computing a feature vector in a feature space for each of the plurality of images based on at least the associated annotation word set from the structured annotation word space;

automatically clustering the feature space into a plurality of clusters, grouping similar feature vectors together within a common cluster; and

determining a visual representative for each of the plurality of clusters.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Multimodal data mining in a multimedia database is addressed as a structured prediction problem, wherein mapping from input to the structured and interdependent output variables is learned. A system and method for multimodal data mining is provided, comprising defining a multimodal data set comprising image information; representing image information of a data object as a set of feature vectors in a feature space; clustering in the feature space to group similar features; associating a non-image representation with a respective image data object based on the clustering; determining a joint feature representation of a respective data object as a mathematical weighted combination of a set of components of the joint feature representation; optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification; and employing the mathematical weighted combination for automatically classifying a new data object.

Citations

20 Claims

1. A method comprising:
- representing each of a plurality of images in a database as information in an image space;
  
  associating an annotation word set, from a structured annotation word space, with each of the plurality of images;
  
  computing a feature vector in a feature space for each of the plurality of images based on at least the associated annotation word set from the structured annotation word space;
  
  automatically clustering the feature space into a plurality of clusters, grouping similar feature vectors together within a common cluster; and
  
  determining a visual representative for each of the plurality of clusters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method according to claim 1, wherein the visual representative is a centroid of a respective cluster in an image space, further comprising determining representative annotation word set for the determined visual representative for at least one cluster.
  - 3. The method according to claim 1, wherein the visual representative is a selected image, further comprising selecting representative annotation words for a respective cluster dependent on the selected image.
  - 4. The method according to claim 1, wherein each respective image has a plurality of image blocks, each image block being associated with a respective annotation word set and feature vector.
  - 5. The method according to claim 1, wherein structured annotation word space comprises interdependent annotation words.
  - 6. The method according to claim 1, further comprising optimizing the joint feature space by selecting weighting coefficients for a linear combination of a plurality of joint feature mapping vectors representing a relationship between each of the plurality of images and the annotation word set, based on at least one optimization criterion.
  - 7. The method according to claim 1, further comprising automatically annotating the visual representative with annotation words from the structured annotation word space using an Enhanced Max Margin Learning (EMML) algorithm.
  - 8. The method according to claim 1, further comprising receiving an input image and automatically outputting a set of annotation words from the structured annotation word space describing the input image.
  - 9. The method according to claim 8, further comprising:
    - partitioning the input image into a plurality of blocks;
      
      computing a feature vector in the feature space for each of the plurality of blocks;
      
      computing a similarity of each feature vector with a plurality of visual representatives;
      
      selecting a set of most relevant visual representatives based on at least the computed similarities;
      
      determining a score relating a list of annotation words from the structured annotation word space and each of the selected most relevant visual representatives;
      
      merging and ranking the scored list of annotation wordsdefining the output set of annotation words as a subset of the list of annotation words which represent the highest ranked annotation words from the merged and ranked list.
  - 10. The method according to claim 9, wherein said determining a score is performed prior to receiving the input image.
  - 11. The method according to claim 1, further comprising receiving a query in the structured word annotation space, and selecting at least one image annotated by the query.
  - 12. The method according to claim 11, further comprising:
    - determining a score between each visual representative and the query;
      
      selecting the a subset of the visual representatives most relevant to the query dependent on the determined score;
      
      computing a similarity score between the selected subset of visual representatives and a plurality of images in the database;
      
      merging and sorting the computed similarity scores; and
      
      determining a most relevant subset of the plurality of images based on merged and sorted computed similarity scores,wherein the selected at least one image comprises the most relevant subset.
  - 13. The method according to claim 12, wherein said determining a score is performed prior to receiving the query.
  - 14. The method according to claim 1, further comprising:
    - receiving an input image;
      
      determining at least one visual representative image for the input image;
      
      determining a set of annotation words for each of the at least one visual representative image;
      
      determining a subset of the plurality of images in the database which have the highest relevance to the determined set of annotation words for the at least one visual representative image;
      
      outputting the determined a subset of the plurality of images in the database.
  - 15. The method according to claim 1, further comprising:
    - receiving a query comprising at least one input image and at least one word;
      
      determining at least one visual representative image corresponding to the query;
      
      determining a set of annotation words for each of the at least one visual representative image;
      
      determining a subset of the plurality of images in the database which have the highest relevance to the determined set of annotation words for the at least one visual representative image;
      
      outputting the determined a subset of the plurality of images in the database.

16. A method, comprising:
- defining a multimodal data set comprising objects having image information and semantic annotations of the image information in a structured semantic space;
  
  representing the image information as a set of feature vectors in a feature space;
  
  clustering the objects based on the feature vectors in the feature space, to group objects having similar features together; and
  
  determining representative image information for each cluster.
- View Dependent Claims (17)
- - 17. The method according to claim 16, comprising determining representative semantic annotations for the representative image information for each cluster, and using the representative semantic annotations to retrieve at least one object from the multimodal data set, wherein the set of feature vectors each comprise a joint feature representation of a respective object as a mathematical weighted combination of a set of components.

18. The method according to claim 18, further comprising classifying a new object in the joint feature space.
- View Dependent Claims (19)
- - 19. The method according to claim 18, further comprising automatically annotating an object with semantic annotations from the structured annotation semantic space using an Enhanced Max Margin Learning (EMML) algorithm.

20. An apparatus, comprising:
- a database comprising a plurality of images each representing information in an image space;
  
  an annotation word set, from a structured annotation word space, associated with each of the plurality of images; and
  
  a feature vector in a feature space, for each of the plurality of images, defined based on at least the associated annotation word set from the structured annotation word space;
  
  at least one processor configured to;
  
  automatically cluster the feature space into a plurality of clusters, each respective cluster grouping similar feature vectors together within a common cluster; and
  
  determine a visual representative for each of the plurality of clusters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Research Foundation for The State University of New York (State University of New York)
Original Assignee
The Research Foundation for The State University of New York (State University of New York)
Inventors
Guo, Zhen, Zhang, Zhongfei

Granted Patent

US 10,007,679 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/00   Information retrieval; Data...

G06F 16/40   of multimedia data, e.g. sl...

G06F 16/45   Clustering; Classification

G06F 16/5838   using colour

G06F 17/10   Complex mathematical operat...

G06F 18/23   Clustering techniques

G06F 18/2411   based on the proximity to a...

G06F 18/253   of extracted features

G06V 10/764   using classification, e.g. ...

ENHANCED MAX MARGIN LEARNING ON MULTIMODAL DATA MINING IN A MULTIMEDIA DATABASE

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

ENHANCED MAX MARGIN LEARNING ON MULTIMODAL DATA MINING IN A MULTIMEDIA DATABASE

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links