Enhanced max margin learning on multimodal data mining in a multimedia database
First Claim
1. A method for multimodal data mining, comprising:
- defining a multimodal data set comprising image information;
representing image information of a data object as a set of feature vectors in a feature space representing a plurality of constraints for each training example, wherein the feature vectors comprise a joint feature representation associated with Lagrange multipliers, the feature vectors being partitioned into a dual variable set comprising two partitions and having non-image representations associated with the respective image data object;
clustering in the feature space to group similar features;
associating a non-image representation with a respective data object based on the clustering;
determining a joint feature representation of a respective data object as a mathematical weighted combination of a set of components of the joint feature representation;
optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification by iteratively solving a Lagrange dual problem, with an automated data processor, by partitioning the Lagrange multipliers into an active set and an inactive set, wherein the Lagrange multiplier for a member of the active set is greater than or equal to zero and the Lagrange multiplier for a member of the inactive set is zero, the iteratively solving comprising moving members of the active set having zero-valued Lagrange multipliers to the inactive set without changing an objective function, and moving members of the inactive set to the active set which result in a decrease in the objective function; and
employing the mathematical weighted combination for automatically classifying a new data object,wherein;
the set of feature vectors in the feature space represents a plurality of constraints for each training example, the feature vectors comprise joint feature representation defined by Φ
, having a Lagrange multiplier μ
i, y for each constraint to form the Lagrangian, wherein i and j denote different elements of a respective set, yi is an annotation and a member of the set Y, xi and xj are each annotations and members of the set X, superscript T denotes a transpose matrix, n is the number of elements in the respective set, y represents a prediction of yi from the set Yi, {tilde over (y)} is an operator of y, α
=Σ
i, yμ
i, yΦ
i,yi, y, l( y,yi) is a loss function defined as the number of the different entries in vectors y and yi, Φ
i,yi, y=Φ
i(yi)−
Φ
i( y), and a kernel function K((xi, y),(xj,{tilde over (y)}))=<
Φ
i,yi, y, Φ
j,yj, y>
, the feature vectors being partitioned into a dual variable set μ
comprising two partitions, μ
B and μ
N and non-image representations S associated with the respective image data object, the dual variable set μ
having i examples such that μ
=[μ
1T . . . μ
nT]T and S=[S1T . . . SnT]T, wherein the lengths of μ and
S are the same, and Ai is defined as a vector which has the same length as that of μ
, where Ai, y=1 and Aj, y=0 for j≠
i, such that A=[A1 . . . An]T, matrix D represents a kernel matrix where each entry is K((xi, y), (xj,{tilde over (y)})), and C represents a vector where each entry is a constant C;
the feature vectors comprise a dual variable set μ
comprising labeled examples which is decomposed into two partitions, μ
B and μ
N ; and
said optimizing comprises iteratively solving for each member of the set;
3 Assignments
0 Petitions
Accused Products
Abstract
Multimodal data mining in a multimedia database is addressed as a structured prediction problem, wherein mapping from input to the structured and interdependent output variables is learned. A system and method for multimodal data mining is provided, comprising defining a multimodal data set comprising image information; representing image information of a data object as a set of feature vectors in a feature space; clustering in the feature space to group similar features; associating a non-image representation with a respective image data object based on the clustering; determining a joint feature representation of a respective data object as a mathematical weighted combination of a set of components of the joint feature representation; optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification; and employing the mathematical weighted combination for automatically classifying a new data object.
6 Citations
20 Claims
-
1. A method for multimodal data mining, comprising:
-
defining a multimodal data set comprising image information; representing image information of a data object as a set of feature vectors in a feature space representing a plurality of constraints for each training example, wherein the feature vectors comprise a joint feature representation associated with Lagrange multipliers, the feature vectors being partitioned into a dual variable set comprising two partitions and having non-image representations associated with the respective image data object; clustering in the feature space to group similar features; associating a non-image representation with a respective data object based on the clustering; determining a joint feature representation of a respective data object as a mathematical weighted combination of a set of components of the joint feature representation; optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification by iteratively solving a Lagrange dual problem, with an automated data processor, by partitioning the Lagrange multipliers into an active set and an inactive set, wherein the Lagrange multiplier for a member of the active set is greater than or equal to zero and the Lagrange multiplier for a member of the inactive set is zero, the iteratively solving comprising moving members of the active set having zero-valued Lagrange multipliers to the inactive set without changing an objective function, and moving members of the inactive set to the active set which result in a decrease in the objective function; and employing the mathematical weighted combination for automatically classifying a new data object, wherein; the set of feature vectors in the feature space represents a plurality of constraints for each training example, the feature vectors comprise joint feature representation defined by Φ
, having a Lagrange multiplier μ
i,y for each constraint to form the Lagrangian, wherein i and j denote different elements of a respective set, yi is an annotation and a member of the set Y, xi and xj are each annotations and members of the set X, superscript T denotes a transpose matrix, n is the number of elements in the respective set,y represents a prediction of yi from the set Yi, {tilde over (y)} is an operator of y, α
=Σ
i,y μ
i,y Φ
i,yi ,y , l(y ,yi) is a loss function defined as the number of the different entries in vectorsy and yi, Φ
i,yi,y =Φ
i(yi)−
Φ
i(y ), and a kernel function K((xi,y ),(xj,{tilde over (y)}))=<
Φ
i,yi,y , Φ
j,yj,y >
, the feature vectors being partitioned into a dual variable set μ
comprising two partitions, μ
B and μ
N and non-image representations S associated with the respective image data object, the dual variable set μ
having i examples such that μ
=[μ
1T . . . μ
nT]T and S=[S1T . . . SnT]T, wherein the lengths of μ and
S are the same, and Ai is defined as a vector which has the same length as that of μ
, where Ai,y =1 and Aj,y =0 for j≠
i, such that A=[A1 . . . An]T, matrix D represents a kernel matrix where each entry is K((xi,y ), (xj,{tilde over (y)})), and C represents a vector where each entry is a constant C;the feature vectors comprise a dual variable set μ
comprising labeled examples which is decomposed into two partitions, μ
B and μ
N ; andsaid optimizing comprises iteratively solving for each member of the set; - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for multimodal data mining, comprising:
-
an input adapted to receive a multimodal data set comprising image information; an automated processor, configured to; represent image information of a data object as a set of feature vectors in a feature space, representing a plurality of constraints for each training example, wherein the feature vectors comprise joint feature representation associated with Lagrange multipliers, the feature vectors being partitioned into a dual variable set comprising two partitions and having non-image representations associated with the respective image data object; perform clustering in the feature space to group similar features; associate a non-image representation with a respective image data object based on the clustering; determine a joint feature representation of a respective data object as a mathematical weighted combination of a set of components of the joint feature representation; optimize a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification, by iteratively solving a Lagrange dual problem, with an automated data processor, by partitioning the Lagrange multipliers into an active set and an inactive set, wherein the Lagrange multiplier for a member of the active set is greater than or equal to zero and the Lagrange multiplier for a member of the inactive set is zero, the iteratively solving comprising moving members of the active set having zero-valued Lagrange multipliers to the inactive set without changing an objective function, and moving members of the inactive set to the active set which result in a decrease in the objective function; and an output from the automated processor, configured to communicate a classification of a new data object based on the mathematical weighted combination, wherein; the set of feature vectors in the feature space represents a plurality of constraints for each training example, the feature vectors comprise joint feature representation defined by Φ
, having a Lagrange multiplier μ
i,y for each constraint to form the Lagrangian, wherein i and j denote different elements of a respective set, yi is an annotation and a member of the set Y, xi and xj are each annotations and members of the set X, superscript T denotes a transpose matrix, n is the number of elements in the respective set,y represents a prediction of yi from the set Yi, {tilde over (y)} is an operator of y, α
=Σ
i,y μ
i,y Φ
i,yi ,y , l(y ,yi) is a loss function defined as the number of the different entries in vectorsy and yi, Φ
i,yi,y =Φ
i(yi)−
Φ
i(y ), and a kernel function K((xi,y ),(xj,{tilde over (y)}))=<
Φ
i,yi,y >
, the feature vectors being partitioned into a dual variable set μ
comprising two partitions, μ
B and μ
N and non-image representations S associated with the respective image data object, the dual variable set μ
having i examples such that μ
=[μ
1T . . . μ
nT]T and S=[S1T . . . SnT]T, wherein the lengths of μ and
S are the same, and A1 is defined as a vector which has the same length as that of μ
, where Ai,y =1 and Aj,y =0 for j≠
i, such that A=[A1 . . . An]T, matrix D represents a kernel matrix where each entry is K((xi,y ), (xj,{tilde over (y)})), and C represents a vector where each entry is a constant C;the feature vectors comprise a dual variable set μ
comprising labeled examples which is decomposed into two partitions, μ
B and μ
N ; andsaid optimizing comprises iteratively solving for each member of the set; - View Dependent Claims (15, 16)
-
-
17. A method for multimodal data processing, comprising:
-
representing image information of a data object as a set of feature vectors in a feature space, representing a plurality of constraints for each training example, wherein the feature vectors comprise joint feature representation associated with Lagrange multipliers, the feature vectors being partitioned into a dual variable set comprising two partitions and having non-image representations associated with the respective image data object; clustering data objects having similar features in the feature space together; associating non-image information with a respective image data object based on the clustering; representing a respective data object as a mathematical weighted combination of a set of joint feature representation components; optimizing a weighting for a plurality of components of the mathematical weighted combination with respect to a prediction error between a predicted classification and a training classification, by iteratively solving a Lagrange dual problem, with an automated data processor, by partitioning the Lagrange multipliers into an active set and an inactive set, wherein the Lagrange multiplier for a member of the active set is greater than or equal to zero and the Lagrange multiplier for a member of the inactive set is zero, the iteratively solving comprising moving members of the active set having zero-valued Lagrange multipliers to the inactive set without changing an objective function, and moving members of the inactive set to the active set which result in a decrease in the objective function; and employing the mathematical weighted combination for automatically classifying a new data object, wherein; the set of feature vectors in the feature space represents a plurality of constraints for each training example, the feature vectors comprise joint feature representation defined by Φ
, having a Lagrange multiplier μ
i,y for each constraint to form the Lagrangian, wherein i and j denote different elements of a respective set, yi is an annotation and a member of the set Y, xi and xj are each annotations and members of the set X, superscript T denotes a transpose matrix, n is the number of elements in the respective set,y represents a prediction of yi from the set Yi, {tilde over (y)} is an operator of y, α
=Σ
i,y μ
i,y Φ
i,yi ,y , l(y ,yi) is a loss function defined as the number of the different entries in vectorsy and yi, Φ
i,yi,y =Φ
i(yi)−
Φ
i(y ), and a kernel function K((xi,y ),(xj,{tilde over (y)}))=<
Φ
i,yi,y >
, the feature vectors being partitioned into a dual variable set μ
comprising two partitions, μ
B and μ
N and non-image representations S associated with the respective image data object, the dual variable set μ
having i examples such that μ
=[μ
1T . . . μ
nT]T and S=[S1T . . . SnT]T, wherein the lengths of μ and
S are the same, and A1 is defined as a vector which has the same length as that of μ
, where Ai,y =1 and Aj,y =0 for j≠
i, such that A=[A1 . . . An]T, matrix D represents a kernel matrix where each entry is K((xi,y ), (xj,{tilde over (y)})), and C represents a vector where each entry is a constant C;the feature vectors comprise a dual variable set μ
comprising labeled examples which is decomposed into two partitions, μ
B and μ
N; andsaid optimizing comprises iteratively solving for each member of the set; - View Dependent Claims (18, 19, 20)
-
Specification