Effective multi-class support vector machine classification
First Claim
1. In a computer-based system, a method of training a multi-category classifier using a binary SVM algorithm, said method comprising:
- storing a plurality of user-defined categories in a memory of a computer;
analyzing a plurality of training examples for each category so as to identify one or more features associated with each category;
calculating at least one feature vector for each of said examples;
transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and
building a SVM classifier for each one of said plurality of categories, wherein said process of building a SVM classifier comprises;
assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only;
optimizing at least one tunable parameter of a SVM classifier for said first categories, wherein said SVM classifier is trained using said first and second classes after the at least one tunable parameter has been optimized; and
optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership;
calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and
testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization;
wherein the SVM classifier is used on real world data, the probability of category membership of the real world data being output to at least one of a user, another system, and another process;
wherein the test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of first and second estimated probabilities on the one hand, and an arithmetic mean of said first and second estimated probabilities on the other hand;
wherein the first estimated probability is indicative of class membership and the second estimated probability is indicative of non-class membership for training examples.
9 Assignments
0 Petitions
Accused Products
Abstract
An improved method of classifying examples into multiple categories using a binary support vector machine (SVM) algorithm. In one preferred embodiment, the method includes the following steps: storing a plurality of user-defined categories in a memory of a computer; analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of the examples; transforming each of the at least one feature vectors so as reflect information about all of the training examples; and building a SVM classifier for each one of the plurality of categories, wherein the process of building a SVM classifier further includes: assigning each of the examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of the examples belongs to another category as well as the first category, such examples are assigned to the first class only; optimizing at least one tunable parameter of a SVM classifier for the first category, wherein the SVM classifier is trained using the first and second classes; and optimizing a function that converts the output of the binary SVM classifier into a probability of category membership.
90 Citations
29 Claims
-
1. In a computer-based system, a method of training a multi-category classifier using a binary SVM algorithm, said method comprising:
-
storing a plurality of user-defined categories in a memory of a computer; analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of said examples; transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and building a SVM classifier for each one of said plurality of categories, wherein said process of building a SVM classifier comprises; assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only; optimizing at least one tunable parameter of a SVM classifier for said first categories, wherein said SVM classifier is trained using said first and second classes after the at least one tunable parameter has been optimized; and optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership; calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and
testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization;wherein the SVM classifier is used on real world data, the probability of category membership of the real world data being output to at least one of a user, another system, and another process; wherein the test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of first and second estimated probabilities on the one hand, and an arithmetic mean of said first and second estimated probabilities on the other hand; wherein the first estimated probability is indicative of class membership and the second estimated probability is indicative of non-class membership for training examples. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer-readable medium for storing instructions that when executed by a computer perform a method of training a multi-platinum category classifier using a binary SVM algorithm, the method comprising:
-
storing a plurality of user-defined categories in a memory of a computer; analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of said examples; transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and building a SVM classifier for each one of said plurality of categories, wherein said process of building a SVM classifier comprises; assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only; optimizing at least one tunable parameter of a SVM classifier for said first category, wherein said SVM classifier is trained using said first and second classes after the at least one tunable parameter has been optimized; and optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership; calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization; wherein the SVM classifier is used on real world data, the probability of category membership of the real world data being output to at least one of a user, another system, and another process; wherein the test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of first and second estimated probabilities on the one hand, and an arithmetic mean of said first and second estimated probabilities on the other hand; wherein the first estimated probability is indicative of class membership and the second estimated probability is indicative of non-class membership for training examples. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. In a computer-based system, a method of training a multi-category classifier using a binary SVM algorithm, said method comprising:
-
storing a plurality of user-defined categories in a memory of a computer; analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of said examples; transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and building a SVM classifier for each one of said plurality of categories; determining whether said first category has more than a predetermined number of training examples assigned to it, wherein if the number of training examples assigned to said first category does not exceed said predetermined number, the process of building a SVM classifier for said first category is aborted; wherein said process of building a SVM classifier comprises, in the following order; assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only; optimizing at least one tunable parameter of a SVM classifier for said first categories wherein said SVM classifier is trained using said first and second classes after the at least one tunable parameter has been optimized; and optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership; wherein the SVM classifier is used on documents, the probability of category membership of the documents being output to at least one of a user, another system, and another process; wherein said at least one tunable parameter of said SVM classifier is optimized using a method comprising the steps of; allocating a subset of the training examples assigned to said first category to a “
holdout”
set, wherein said subset of training examples are left out of said training step;calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization; wherein said test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a relationship between SVM classifier scores s and −
s produced by said SVM classifier, a first estimated probability indicative of class membership and a second estimated probability indicative of non-class membership for training examples with an SVM classifier score s, as provided by probability equations q(C|s) and 1.0−
q(C|−
s), respectively;wherein the test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of said first and second estimated probabilities, on the one hand, and an arithmetic mean of said first and second estimated probabilities, on the other hand; wherein said at least one tunable parameter comprises two tunable parameters for said SVM classifier, one for a positive class, and one for a negative class. - View Dependent Claims (28, 29)
-
Specification