Support vector machine—recursive feature elimination (SVM-RFE)
DC CAFCFirst Claim
Patent Images
1. A computer-implemented method comprising:
- (a) inputting into a computer processor programmed to execute a support vector machine a set of training examples having known labels with regard to two or more classes, each training example described by a vector of feature values for a plurality of features, the support vector machine comprising a decision function having a plurality of weights, wherein each feature has a corresponding weight;
(b) training the support vector machine by optimizing the plurality of weights so that a cost function is minimized and support vectors comprising a subset of the training examples are defined, wherein the decision function is based on the support vectors;
(c) computing ranking criteria using the optimized plurality of weights, wherein the ranking criterion estimates for each feature the effect on the cost function of removing that feature, and wherein features having the smallest effect on the cost function have the smallest ranking criteria;
(d) eliminating one or more features corresponding to the smallest ranking criteria to yield a reduced set of features;
(e) repeating steps (c) through (d) for the reduced set of features for a plurality of iterations until a subset of features of predetermined size remains, wherein the subset of features comprises determinative features for separating the set of training examples into the two or more classes; and
(f) generating at a printer or display device an output comprising a listing of the determinative features.
3 Assignments
Litigations
2 Petitions
Accused Products
Abstract
Identification of a determinative subset of features from within a group of features is performed by training a support vector machine using training samples with class labels to determine a value of each feature, where features are removed based on their the value. One or more features having the smallest values are removed and an updated kernel matrix is generated using the remaining features. The process is repeated until a predetermined number of features remain which are capable of accurately separating the data into different classes.
-
Citations
38 Claims
-
1. A computer-implemented method comprising:
-
(a) inputting into a computer processor programmed to execute a support vector machine a set of training examples having known labels with regard to two or more classes, each training example described by a vector of feature values for a plurality of features, the support vector machine comprising a decision function having a plurality of weights, wherein each feature has a corresponding weight; (b) training the support vector machine by optimizing the plurality of weights so that a cost function is minimized and support vectors comprising a subset of the training examples are defined, wherein the decision function is based on the support vectors; (c) computing ranking criteria using the optimized plurality of weights, wherein the ranking criterion estimates for each feature the effect on the cost function of removing that feature, and wherein features having the smallest effect on the cost function have the smallest ranking criteria; (d) eliminating one or more features corresponding to the smallest ranking criteria to yield a reduced set of features; (e) repeating steps (c) through (d) for the reduced set of features for a plurality of iterations until a subset of features of predetermined size remains, wherein the subset of features comprises determinative features for separating the set of training examples into the two or more classes; and (f) generating at a printer or display device an output comprising a listing of the determinative features. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-program product embodied on a non-transitory computer-readable medium comprising instructions for executing a support vector machine, and further comprising instructions for:
-
training the support vector machine to determine a value for each feature in a group of features provided by a set of training samples having known labels corresponding to two or more classes, wherein training comprises generating a kernel matrix of components, each component comprising a dot product of two training samples, and adjusting a multiplier corresponding to each training sample to minimize a cost function; eliminating at least one feature with a minimum value from the group to provide a reduced group of features; generating an updated kernel matrix using the reduced group of features while keeping the multiplier unchanged; determining an updated value for each feature in the reduced group of features; repeating the steps of eliminating at least one feature, generating an updated kernel matrix and determining an updated value until a predetermined number of features remains; and outputting a ranked list of features. - View Dependent Claims (8, 9, 10, 11, 12, 30, 31, 37, 38)
-
-
13. A non-transitory machine-readable medium comprising a plurality of instructions, which in response to being executed, result in a computing system:
-
(a) training a support vector machine by generating a kernel matrix from a set of training samples and adjusting a multiplier corresponding to each training sample to minimize a cost function; (b) determining a weight value for each feature in a group of features that describe the training samples; (c) eliminating at least one feature with a minimum weight value from the group; (d) generating an updated kernel matrix while keeping the multiplier unchanged; (e) updating the weight value for each feature of the group based on the updated kernel matrix; (f) repeating steps (c) through (e) until a predetermined number of features remains to generate a ranked list of features; and (g) generating a report to a printer or display device comprising the ranked list of features. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A non-transitory machine-readable medium comprising a plurality of instructions, which in response to being executed, result in a computing system:
-
identifying a determinative subset of features that are most correlated to patterns in sample data by; retrieving a training data having class labels with respect to the patterns from a memory in communication with a computer processor programmed for executing a support vector machine comprising a kernel; calculating a kernel matrix using the training data to determine a value for each feature in the group of features; eliminating at least one feature with a minimum value from the group; calculating an updated kernel matrix, each component of the updated kernel matrix comprising a dot product of two training samples provided by at least a part of the training data that corresponds to the eliminated feature; determining an updated value for each remaining feature of the group of features based on the updated kernel matrix; repeating steps eliminating, calculating an updated kernel matrix and determining an updated value for a plurality of iterations until a pre-determined number of features in the group remain; and generating an output comprising a ranked list of features, wherein the features in the ranked list comprise the determinative subset of features for predicting patterns in new data. - View Dependent Claims (19, 20, 21, 28, 29)
-
-
22. A computer-implemented method comprising:
-
(a) inputting into a computer processor programmed to execute a support vector machine a set of training examples having known labels with regard to two or more classes, each training example described by a vector of feature values for a plurality of features, the support vector machine comprising a decision function having a plurality of weights, wherein each feature has a corresponding weight; (b) training the support vector machine by optimizing the plurality of weights so that a cost function is minimized and support vectors comprising a subset of the training examples are defined, wherein the decision function is based on the support vectors; (c) computing ranking criteria using the optimized plurality of weights, wherein the ranking criterion estimates for each feature the effect on the cost function of removing that feature, and wherein features having the smallest effect on the cost function have the smallest ranking criteria; (d) eliminating one or more features corresponding to the smallest ranking criteria to yield a reduced set of features; (e) repeating steps (c) through (d) for the reduced set of features for a plurality of iterations until a subset of features of predetermined size remains, wherein the subset of features comprises determinative features for separating the set of training examples into the two or more classes; and (f) transferring a listing of the determinative features to a media. - View Dependent Claims (23, 24, 25, 26, 27)
-
-
32. A non-transitory machine-readable medium comprising a plurality of instructions, which in response to being executed, result in a computing system:
-
(a) training a support vector machine by generating a kernel matrix from a set of training samples and adjusting a multiplier corresponding to each training sample to minimize a cost function; (b) determining a weight value for each feature in a group of features that describe the training samples; (c) eliminating at least one feature with a minimum weight value from the group; (d) generating an updated kernel matrix while keeping the multiplier unchanged; (e) updating the weight value for each feature of the group based on the updated kernel matrix; (f) repeating steps (c) through (e) until a predetermined number of features remains to generate a ranked list of features; and (g) transferring the ranked list of features to a media. - View Dependent Claims (33, 34, 35, 36)
-
Specification