Enhanced max margin learning on multimodal data mining in a multimedia database

US 8,923,630 B2
Filed: 05/28/2013
Issued: 12/30/2014
Est. Priority Date: 08/08/2008
Status: Active Grant

First Claim

Patent Images

1. A data mining method, comprising:

receiving a set of multimodal data objects comprising semantically interrelated information of a first type and a second type, each being of a different type selected from the group consisting of image information, audio information, video information, and semantic information;

representing at least the first type of information of the multimodal data objects as feature vectors within a feature space comprising the first type of information and the second type of information, and the semantic interrelation between the first type of information and the second type of information;

clustering the feature vectors into classified clusters according to at least one semantic clustering criterion by at least one automated processor, to thereby determine a classification of the respective feature vectors;

associating data objects with respective members of the set of multimodal data objects by the at least one automated processor, based on the clustering, the associated data objects comprising information of a third type semantically interrelated to the second type of information, selected from the group consisting of images, audio, video and semantic information, wherein the type of information of the third type is distinct from the type of information of the first type;

estimating a joint feature representation of the set of multimodal data objects and the associated data objects by the at least one automated processor;

optimizing the joint feature representation by the at least one automated processor to provide a structured output space of interdependent objects, based on at least a prediction error criterion, by iteratively solving a dual problem by selectively partitioning data objects into a working set and a non-working set, comprising;

moving the data objects in the non-working set that can be moved without changing an objective function to the working set, andmoving the data objects in the working set that can be moved with a decrease in the objective function to the non-working set;

receiving a query represented according to the first type of information; and

identifying data objects from the set of multimodal data objects that correspond to the query by the at least one automated processor, based on at least the structured output space of interdependent multimodal objects.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Multimodal data mining in a multimedia database is addressed as a structured prediction problem, wherein mapping from input to the structured and interdependent output variables is learned. A system and method for multimodal data mining is provided, comprising defining a multimodal data set comprising image information; representing image information of a data object as a set of feature vectors in a feature space; clustering in the feature space to group similar features; associating a non-image representation with a respective image data object based on the clustering; determining a joint feature representation of a respective data object as a mathematical weighted combination of a set of components of the joint feature representation; optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification; and employing the mathematical weighted combination for automatically classifying a new data object.

9 Citations

View as Search Results

16 Claims

1. A data mining method, comprising:
- receiving a set of multimodal data objects comprising semantically interrelated information of a first type and a second type, each being of a different type selected from the group consisting of image information, audio information, video information, and semantic information;
  
  representing at least the first type of information of the multimodal data objects as feature vectors within a feature space comprising the first type of information and the second type of information, and the semantic interrelation between the first type of information and the second type of information;
  
  clustering the feature vectors into classified clusters according to at least one semantic clustering criterion by at least one automated processor, to thereby determine a classification of the respective feature vectors;
  
  associating data objects with respective members of the set of multimodal data objects by the at least one automated processor, based on the clustering, the associated data objects comprising information of a third type semantically interrelated to the second type of information, selected from the group consisting of images, audio, video and semantic information, wherein the type of information of the third type is distinct from the type of information of the first type;
  
  estimating a joint feature representation of the set of multimodal data objects and the associated data objects by the at least one automated processor;
  
  optimizing the joint feature representation by the at least one automated processor to provide a structured output space of interdependent objects, based on at least a prediction error criterion, by iteratively solving a dual problem by selectively partitioning data objects into a working set and a non-working set, comprising;
  
  moving the data objects in the non-working set that can be moved without changing an objective function to the working set, andmoving the data objects in the working set that can be moved with a decrease in the objective function to the non-working set;
  
  receiving a query represented according to the first type of information; and
  
  identifying data objects from the set of multimodal data objects that correspond to the query by the at least one automated processor, based on at least the structured output space of interdependent multimodal objects.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, wherein the set of multimodal data objects comprise image information and annotations of the image information.
  - 3. The method according to claim 1, wherein the first type of information comprises image information and the second type of information comprises semantic information.
  - 4. The method according to claim 1, wherein the first type of information comprises image information and the second type of information comprises audio information.
  - 5. The method according to claim 1, wherein the first type of information comprises semantic information and the second type of information comprises audio information.
  - 6. The method according to claim 1, wherein the multimodal data objects comprise semantic information, image information and audio information.
  - 7. The method according to claim 1, wherein the first type of information comprises semantic information, the third type of information comprises image information, wherein the query comprises a textual word query;
    - and said identifying comprises retrieving multimodal data objects comprising image information which corresponds the textual word query.
  - 8. The method according to claim 1, wherein the first type of information comprises at least one of image information, audio information and video information, and the third type of information comprises semantic information which represents an annotation of the first type of information for the respective multimodal data object after the associating.
  - 9. The method according to claim 1, further comprising providing within the set of multimodal data objects labeled examples which each comprise a non-semantic representation of an item and at least one semantic label variable within the structured output space of interdependent objects.
  - 10. The method according to claim 1, wherein said optimizing comprises representing a respective multimodal data object as a mathematical weighted combination of a set of joint feature representation components, and optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification based on a training example, by iteratively solving a Lagrange dual problem by partitioning the Lagrange multipliers into an active set and an inactive set, wherein the Lagrange multiplier for a member of the active set is greater than or equal to zero and the Lagrange multiplier for a member of the inactive set is zero, moving members of the active set having zero-valued Lagrange multipliers to the inactive set without changing an objective function and moving members of the inactive set to the active set which result in a decrease in the objective function.

11. A data mining system, comprising:
- an input configured to receive receiving a set of multimodal data objects comprising semantically interrelated information of a first type and information of a second type, each of the first type and the second type being different and being selected from the group consisting of image information, audio information, video information, and semantic information;
  
  an automated processor, configured to;
  
  represent at least the first type of information of the multimodal data objects as feature vectors within a feature space comprising the first type of information and the second type of information, and the semantic interrelation between the first type of information and the second type of information;
  
  cluster the feature vectors according to at least one clustering criterion, to thereby determine a classification of the respective feature vectors;
  
  associate data objects comprising information of a third type semantically interrelated to the second type of information, selected from the group consisting of images, audio, video and semantic information, wherein the type of information of the third type is distinct from the type of information of the first type, with respective multimodal data objects based on the clustering;
  
  estimate a joint feature representation of the set of multimodal data objects and the associated data objects;
  
  optimize the joint feature representation to provide a structured output space of interdependent objects, based on at least a prediction error criterion, by iteratively solving a dual problem by selectively partitioning data objects into a working set and a non-working set, comprising;
  
  moving the data objects in the non-working set that can be moved without changing an objective function to the working set, andmoving the data objects in the working set that can be moved with a decrease in the objective function to the non-working set;
  
  receive a query represented according to the first type of information; and
  
  identify data objects from the set of multimodal data objects that correspond to the query by the at least one automated processor, based on at least the structured output space of interdependent multimodal objects; and
  
  an output port from the automated processor, configured to communicate at least one of the identified data objects and identifiers of the identified data objects.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system according to claim 11, wherein the multimodal data set comprises at least one of image information and audio information as the first type of information, and semantic information representing annotations of the multimodal data objects as the second type of information.
  - 13. The system according to claim 11, wherein the first type of information comprises at least one of image information, audio information and video information, and the second type of information comprises semantic information which represents an annotation of the first type of information for the respective multimodal data object after the association.
  - 14. The system according to claim 11, wherein the automated processor is configured to optimize by representing a respective multimodal data object as a mathematical weighted combination of a set of joint feature representation components, and optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification based on a training example, by iteratively solving a Lagrange dual problem by partitioning the Lagrange multipliers into an active set and an inactive set, wherein the Lagrange multiplier for a member of the active set is greater than or equal to zero and the Lagrange multiplier for a member of the inactive set is zero, moving members of the active set having zero-valued Lagrange multipliers to the inactive set without changing an objective function and moving members of the inactive set to the active set which result in a decrease in the objective function.
  - 15. The system according to claim 11, wherein the automated processor is configured to optimize the joint feature representation by at least solving a Lagrange dual problem by partitioning Lagrange multipliers into an active set and an inactive set, wherein the active set has Lagrange multipliers greater than or equal to zero and the inactive set has Lagrange multipliers which are zero, iteratively moving members of the active set having zero-valued Lagrange multipliers to the inactive set without changing an objective function and moving members of the inactive set to the active set which result in a decrease in the objective function.

16. A data mining method, comprising:
- receiving a set of multimodal data objects comprising both semantic information and semantically interrelated image information;
  
  representing the multimodal data objects as feature vectors;
  
  clustering the feature vectors in a feature space according to a semantic clustering criterion within a feature space comprising the image information and the semantic information by at least one automated processor, to thereby determine a classification of the respective feature vectors;
  
  storing data in a memory representing a joint feature representation of the multimodal data objects to provide a structured output space of interdependent objects, by representing respective multimodal data objects as a mathematical weighted combination of a set of joint feature representation components, and optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification based on a training example, by iteratively solving a Lagrange dual problem to partition the Lagrange multipliers into an active set and an inactive set, wherein the Lagrange multiplier for a member of the active set being greater than or equal to zero and the Lagrange multiplier for a member of the inactive set being zero, moving members of the active set having zero-valued Lagrange multipliers to the inactive set without changing an objective function and moving members of the inactive set to the active set which result in a decrease in the objective function;
  
  receiving a query represented according to the semantic information or semantically interrelated image information; and
  
  identifying data objects from the set of multimodal data objects that correspond to the query by the at least one automated processor, based on at least the structured output space of interdependent multimodal objects.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Research Foundation for The State University of New York (State University of New York)
Original Assignee
The Research Foundation for The State University of New York (State University of New York)
Inventors
Guo, Zhen, Zhang, Zhongfei
Primary Examiner(s)
Bella, Matthew
Assistant Examiner(s)
Rosario, Dennis

Application Number

US13/903,018
Publication Number

US 20130251248A1
Time in Patent Office

581 Days
Field of Search

382/225, 382/224, 382/159, 706/12
US Class Current

382/225
CPC Class Codes

G06F 16/00   Information retrieval; Data...

G06F 16/40   of multimedia data, e.g. sl...

G06F 16/45   Clustering; Classification

G06F 16/5838   using colour

G06F 17/10   Complex mathematical operat...

G06F 18/00   Pattern recognition

G06F 18/23   Clustering techniques

G06F 18/2411   based on the proximity to a...

G06F 18/253   of extracted features

G06V 10/764   using classification, e.g. ...

Enhanced max margin learning on multimodal data mining in a multimedia database

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

9 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Enhanced max margin learning on multimodal data mining in a multimedia database

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links