Effective multi-class support vector machine classification

US 7,386,527 B2
Filed: 04/10/2003
Issued: 06/10/2008
Est. Priority Date: 12/06/2002
Status: Active Grant

First Claim

Patent Images

1. In a computer-based system, a method of training a multi-category classifier using a binary SVM algorithm, said method comprising:

storing a plurality of user-defined categories in a memory of a computer;

analyzing a plurality of training examples for each category so as to identify one or more features associated with each category;

calculating at least one feature vector for each of said examples;

transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and

building a SVM classifier for each one of said plurality of categories, wherein said process of building a SVM classifier comprises;

assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only;

optimizing at least one tunable parameter of a SVM classifier for said first categories, wherein said SVM classifier is trained using said first and second classes after the at least one tunable parameter has been optimized; and

optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership;

calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and

testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization;

wherein the SVM classifier is used on real world data, the probability of category membership of the real world data being output to at least one of a user, another system, and another process;

wherein the test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of first and second estimated probabilities on the one hand, and an arithmetic mean of said first and second estimated probabilities on the other hand;

wherein the first estimated probability is indicative of class membership and the second estimated probability is indicative of non-class membership for training examples.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An improved method of classifying examples into multiple categories using a binary support vector machine (SVM) algorithm. In one preferred embodiment, the method includes the following steps: storing a plurality of user-defined categories in a memory of a computer; analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of the examples; transforming each of the at least one feature vectors so as reflect information about all of the training examples; and building a SVM classifier for each one of the plurality of categories, wherein the process of building a SVM classifier further includes: assigning each of the examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of the examples belongs to another category as well as the first category, such examples are assigned to the first class only; optimizing at least one tunable parameter of a SVM classifier for the first category, wherein the SVM classifier is trained using the first and second classes; and optimizing a function that converts the output of the binary SVM classifier into a probability of category membership.

90 Citations

View as Search Results

29 Claims

1. In a computer-based system, a method of training a multi-category classifier using a binary SVM algorithm, said method comprising:
- storing a plurality of user-defined categories in a memory of a computer;
  
  analyzing a plurality of training examples for each category so as to identify one or more features associated with each category;
  
  calculating at least one feature vector for each of said examples;
  
  transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and
  
  building a SVM classifier for each one of said plurality of categories, wherein said process of building a SVM classifier comprises;
  
  assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only;
  
  optimizing at least one tunable parameter of a SVM classifier for said first categories, wherein said SVM classifier is trained using said first and second classes after the at least one tunable parameter has been optimized; and
  
  optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership;
  
  calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and
  
  testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization;
  
  wherein the SVM classifier is used on real world data, the probability of category membership of the real world data being output to at least one of a user, another system, and another process;
  
  wherein the test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of first and second estimated probabilities on the one hand, and an arithmetic mean of said first and second estimated probabilities on the other hand;
  
  wherein the first estimated probability is indicative of class membership and the second estimated probability is indicative of non-class membership for training examples.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1 further comprising determining whether said first category has more than a predetermined number of training examples assigned to it, wherein if the number of training examples assigned to said first category does not exceed said predetermined number, the process of building a SVM classifier for said first category is aborted.
  - 3. The method of claim 1 further comprising testing whether the trained SVM classifier could be optimized, wherein if said SVM classifier could not be optimized, said SVM classifier for said first category is discarded.
  - 4. The method of claim 1 wherein said at least one tunable parameter of said SVM classifier is optimized using a method further comprising the step:
    - allocating a subset of the training examples assigned to said first category to a “
      
      holdout”
      
      set, wherein said subset of training examples are left out of said training steph.
  - 5. The method of claim 1 wherein said test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a relationship between SVM classifier scores s and −
    - s produced by said SVM classier, the first estimated and the second estimated probability having an SVM classifier score s, as provided by probability equations q(C|s) and 1.0−
      
      q(C|−
      
      s), respectively.
  - 6. The method of claim 4 wherein said at least one tunable parameter comprises two tunable parameters for said SVM classifier, one for a positive class, and one for a negative class.
  - 7. The method of claim 4 wherein said SVM classifier is based on a formulation having two cost factors (one for a positive class, one for a negative class), as follows:
    - $\min [\frac{1}{2} \sum_{i} \sum_{j} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) - \sum_{i} α_{i}]$ where;
      
      α
      
      ₁is a Lagrange multiplier for example X_i, $y_{i} = {\begin{matrix} + 1, iff x_{i} is in the positive class \\ - 1, iff x_{i} is in the negative class \end{matrix},$ φ
      
      (x) is a function that maps input vectors to feature vectors,
      κ
      
      (x_i, x_j)=φ
      
      (x_i)·
      
      φ
      
      (x_j) Subject to the constraints;
      
      $\begin{matrix} 0 \leq α_{i} \leq C_{+}, \forall {i ❘ y_{i} = + 1} \\ 0 \leq α_{i} \leq C_{-}, \forall {i ❘ y_{i} = - 1} \\ \sum_{i} y_{i} α_{i} = 0 \end{matrix}$ where C⁺and C⁻are the two user definable cost factors.
  - 8. The method of claim 4 wherein said SVM classifier is based on a formulation having two cost factors (one for a positive class, one for a negative class), as follows:
    - $\min [\frac{1}{2} { w }^{2} - \sum_{i} α_{i}]$ where;
      
      α
      
      ₁is a Lagrange multiplier for example X_i, $y_{i} = {\begin{matrix} + 1, iff x_{i} is in the positive class \\ - 1, iff x_{i} is in the negative class \end{matrix},$ φ
      
      (x) is a function that maps input vectors to feature vectors, $w = \sum_{i} α_{i} y_{i} Φ (x_{i})$ subject to the constraints $\begin{matrix} 0 \leq α_{i} \leq C_{+}, \forall {i ❘ y_{i} = + 1} \\ 0 \leq α_{i} \leq C_{-}, \forall {i ❘ y_{i} = - 1} \\ \sum_{i} y_{i} α_{i} = 0 \end{matrix}$ where w is the weight vector perpendicular to the hyperplane, C⁺and C⁻are the two user definable cost factors.
  - 9. The method of claim 1 wherein the following steps of the method are performed in the following order:
    - a) assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only;
      
      b) optimizing at least one tunable parameter of a SVM classifier for said first categories, wherein said SVM classifier is trained using said first and second classes; and
      
      c) optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership.
  - 10. The method of claim 1 wherein said SVM classifier for said first category calculates a score s for said first category, wherein said score is optimized to fit a slope parameter in a sigmoid function that transforms SVM scores to probability estimates.
  - 11. The method of claim 1 wherein the calibration of SVM scores is performed without using unbound support vector training examples.
  - 12. The method of claim 1, wherein the calibration of SVM scores is performed using training examples allocated to a holdout set.
  - 13. The method of claim 1 wherein said training examples comprise documents containing text, wherein said real world data includes documents containing text.

14. A computer-readable medium for storing instructions that when executed by a computer perform a method of training a multi-platinum category classifier using a binary SVM algorithm, the method comprising:
- storing a plurality of user-defined categories in a memory of a computer;
  
  analyzing a plurality of training examples for each category so as to identify one or more features associated with each category;
  
  calculating at least one feature vector for each of said examples;
  
  transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and
  
  building a SVM classifier for each one of said plurality of categories, wherein said process of building a SVM classifier comprises;
  
  assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only;
  
  optimizing at least one tunable parameter of a SVM classifier for said first category, wherein said SVM classifier is trained using said first and second classes after the at least one tunable parameter has been optimized; and
  
  optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership;
  
  calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and
  
  testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization;
  
  wherein the SVM classifier is used on real world data, the probability of category membership of the real world data being output to at least one of a user, another system, and another process;
  
  wherein the test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of first and second estimated probabilities on the one hand, and an arithmetic mean of said first and second estimated probabilities on the other hand;
  
  wherein the first estimated probability is indicative of class membership and the second estimated probability is indicative of non-class membership for training examples.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The computer-readable medium of claim 14 wherein said method further comprises determining whether said first category has more than a predetermined number of training examples assigned to it, wherein if the number of training examples assigned to said first category does not exceed said predetermined number, the process of building a SVM classifier for said first category is aborted.
  - 16. The computer-readable medium of claim 14 wherein said method further comprises testing whether the trained SVM classifier could be optimized, wherein if said SVM classifier could not be optimized, said SVM classifier for said first category is discarded.
  - 17. The computer-readable medium of claim 14 wherein said at least one tunable parameter of said SVM classifier is optimized using a method further comprising:
    - allocating a subset of the training examples assigned to said first category to a “
      
      holdout”
      
      set, wherein said subset of training examples are left out of said training step.
  - 18. The computer-readable medium of claim 17 wherein said test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a relationship between SVM classifier scores s and —
    - s produced by said SVM classifier, the first estimated probability and the second estimated probability having an SVM classifier score s, as provided by probability equations q(C|s) and 1.0 −
      
      q(C|−
      
      s), respectively.
  - 19. The computer-readable medium of claim 17 wherein said at least one tunable parameter comprises two tunable parameters for said SVM classifier, one for a positive class, and one for a negative class.
  - 20. The computer-readable medium of claim 17 wherein said SVM classifier is based on a formulation having two cost factors (one for a positive class, one for a negative class), as follows:
    - $\min [\frac{1}{2} \sum_{i} \sum_{j} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) - \sum_{i} α_{i}]$ where;
      
      α
      
      ₁is a Lagrange multiplier for example X_i, $y_{i} = {\begin{matrix} + 1, iff x_{i} is in the positive class \\ - 1, iff x_{i} is in the negative class \end{matrix},$ φ
      
      (x) is a function that maps input vectors to feature vectors,
      κ
      
      (x_i, x_j)=φ
      
      (x_i)·
      
      φ
      
      (x_j) Subject to the constraints;
      
      $\begin{matrix} 0 \leq α_{i} \leq C_{+}, \forall {i ❘ y_{i} = + 1} \\ 0 \leq α_{i} \leq C_{-}, \forall {i ❘ y_{i} = - 1} \\ \sum_{i} y_{i} α_{i} = 0 \end{matrix}$ where C⁺and C⁻are the two user definable cost factors.
  - 21. The computer readable medium of claim 17 wherein said SVM classifier is based on a formulation having two cost factors (one for a positive class, one for a negative class as follows:
    - $\min [\frac{1}{2} { w }^{2} - \sum_{i} α_{i}]$ where;
      
      α
      
      ₁is a Lagrange multiplier for example X_i, $y_{i} = {\begin{matrix} + 1, iff x_{i} is in the positive class \\ - 1, iff x_{i} is in the negative class \end{matrix},$ φ
      
      (x) is a function that maps input vectors to feature vectors, $w = \sum_{i} α_{i} y_{i} Φ (x_{i})$ subject to the constraints $\begin{matrix} 0 \leq α_{i} \leq C_{+}, \forall {i ❘ y_{i} = + 1} \\ 0 \leq α_{i} \leq C_{-}, \forall {i ❘ y_{i} = - 1} \\ \sum_{i} y_{i} α_{i} = 0 \end{matrix}$ where w is the weight vector perpendicular to the hyperplane, C⁺and C⁻are the two user definable cost factors.
  - 22. The computer-readable medium of claim 14 wherein the following steps are performed in the following order:
    - a) assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category such examples are assigned to the first class only;
      
      b) optimizing at least one tunable parameter of a SVM classifier for said first categories, wherein said SVM classifier is trained using said first and second classes; and
      
      c) optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership.
  - 23. The computer-readable medium of claim 14 wherein said SVM classifier for said first category calculates a scores for said first category, wherein said score is optimized to fit a slope parameter in a sigmoid function that transforms SVM scores to probability estimates.
  - 24. The computer-readable medium of claim 23 wherein the calibration of SVM scores is performed without using unbound support vector training examples.
  - 25. The computer-readable medium of claim 23 wherein the calibration of SVM scores is performed using training examples allocated to a holdout set.
  - 26. The computer-readable medium of claim 14 wherein said training examples comprise documents containing text, wherein said real world data includes documents containing text.

27. In a computer-based system, a method of training a multi-category classifier using a binary SVM algorithm, said method comprising:
- storing a plurality of user-defined categories in a memory of a computer;
  
  analyzing a plurality of training examples for each category so as to identify one or more features associated with each category;
  
  calculating at least one feature vector for each of said examples;
  
  transforming each of said at least one feature vectors using a first mathematical function so as to provide desired information about each of said training examples; and
  
  building a SVM classifier for each one of said plurality of categories;
  
  determining whether said first category has more than a predetermined number of training examples assigned to it, wherein if the number of training examples assigned to said first category does not exceed said predetermined number, the process of building a SVM classifier for said first category is aborted;
  
  wherein said process of building a SVM classifier comprises, in the following order;
  
  assigning each of said examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if any one of said examples belongs to both said first category and another category, such examples are assigned to the first class only;
  
  optimizing at least one tunable parameter of a SVM classifier for said first categories wherein said SVM classifier is trained using said first and second classes after the at least one tunable parameter has been optimized; and
  
  optimizing a second mathematical function that converts the output of the binary SVM classifier into a probability of category membership;
  
  wherein the SVM classifier is used on documents, the probability of category membership of the documents being output to at least one of a user, another system, and another process;
  
  wherein said at least one tunable parameter of said SVM classifier is optimized using a method comprising the steps of;
  
  allocating a subset of the training examples assigned to said first category to a “
  
  holdout”
  
  set, wherein said subset of training examples are left out of said training step;
  
  calculating a solution for the SVM classifier for the first category using predetermined initial value(s) for said at least one tunable parameter; and
  
  testing said solution for said first category to determine if the solution is characterized by either over-generalization or over-memorization;
  
  wherein said test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a relationship between SVM classifier scores s and −
  
  s produced by said SVM classifier, a first estimated probability indicative of class membership and a second estimated probability indicative of non-class membership for training examples with an SVM classifier score s, as provided by probability equations q(C|s) and 1.0−
  
  q(C|−
  
  s), respectively;
  
  wherein the test to determine whether said SVM classifier solution for said first category is characterized by either over-generalization or over-memorization is based on a difference between a harmonic mean of said first and second estimated probabilities, on the one hand, and an arithmetic mean of said first and second estimated probabilities, on the other hand;
  
  wherein said at least one tunable parameter comprises two tunable parameters for said SVM classifier, one for a positive class, and one for a negative class.
- View Dependent Claims (28, 29)
- - 28. The method of claim 27 wherein said SVM classifier is based on a formulation having two cost factors (one for a positive class, one for a negative class), as follows:
    - $\min [\frac{1}{2} \sum_{i} \sum_{j} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) - \sum_{i} α_{i}]$ where;
      
      α
      
      ₁is a Lagrange multiplier for example X_i, $y_{i} = {\begin{matrix} + 1, iff x_{i} is in the positive class \\ - 1, iff x_{i} is in the negative class \end{matrix},$ φ
      
      (x) is a function that maps input vectors to feature vectors,
      κ
      
      (x_i, x_j)=φ
      
      (x_i)·
      
      φ
      
      (x_j) Subject to the constraints;
      
      $\begin{matrix} 0 \leq α_{i} \leq C_{+}, \forall {i ❘ y_{i} = + 1} \\ 0 \leq α_{i} \leq C_{-}, \forall {i ❘ y_{i} = - 1} \\ \sum_{i} y_{i} α_{i} = 0 \end{matrix}$ where C⁺and C⁻are the two user definable cost factors.
  - 29. The method of claim 27 wherein said SVM classifier is based on a formulation having two cost factors (one for a positive class, one for a negative class), as follows:
    - $\min [\frac{1}{2} { w }^{2} - \sum_{i} α_{i}]$ where;
      
      α
      
      ₁is a Lagrange multiplier for example X_i, $y_{i} = {\begin{matrix} + 1, iff x_{i} is in the positive class \\ - 1, iff x_{i} is in the negative class \end{matrix},$ φ
      
      (x) is a function that maps input vectors to feature vectors, $w = \sum_{i} α_{i} y_{i} Φ (x_{i})$ subject to the constraints $\begin{matrix} 0 \leq α_{i} \leq C_{+}, \forall {i ❘ y_{i} = + 1} \\ 0 \leq α_{i} \leq C_{-}, \forall {i ❘ y_{i} = - 1} \\ \sum_{i} y_{i} α_{i} = 0 \end{matrix}$ where w is the weight vector perpendicular to the hyperplane, C⁺and C⁻are the two user definable cost factors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kofax Incorporated
Original Assignee
Kofax Incorporated
Inventors
Schmidtler, Mauritius A. R., Harris, Christopher K.
Primary Examiner(s)
Vincent; David
Assistant Examiner(s)
BUSS, BENJAMIN J

Application Number

US10/412,163
Publication Number

US 20040111453A1
Time in Patent Office

1,888 Days
Field of Search

706/45, 706/12, 706/20, 706/59, 382/159, 382/184, 382/224
US Class Current

706/59
CPC Class Codes

G06F 18/2411 based on the proximity to a...

Effective multi-class support vector machine classification

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

90 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Effective multi-class support vector machine classification

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

90 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links