Method for boosting the performance of machine-learning classifiers

US 7,024,033 B2
Filed: 03/04/2002
Issued: 04/04/2006
Est. Priority Date: 12/08/2001
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented process for using feature selection to obtain a strong classifier from a combination of weak classifiers, comprising using a computer to perform the following process actions:

(a) inputting a set of training examples, a prescribed maximum number of weak classifiers, a cost function capable of measuring the overall cost, and an acceptable maximum cost;

(b) computing a set of weak classifiers, each classifier being associated to a particular feature of the training examples,(c) determining which of the set of weak classifiers is the most significant classifier;

(d) adding said most significant classifier to a current set of optimal weak classifiers;

(e) determining which of the current set of optimal weak classifiers is the least significant classifier;

(f) computing the overall cost for the current set of optimal weak classifiers using the cost function;

(g) conditionally removing the least significant classifier for the current set of optimal weak classifiers;

(h) computing the overall cost for the current set of optimal weak classifiers less the least significant classifier using the cost function;

(i) determining whether the removal of the least significant classifier results in a lower overall cost;

(j) whenever it is determined that the removal of the least significant classifier results in a lower overall cost, eliminating the least significant classifier;

(k) recomputing each classifier in the current set of optimal weak classifiers associated with a feature added subsequent to the eliminated classifier while keeping the earlier optimal weak classifiers unchanged;

(l) repeat actions (f) through (k) until it is determined the removal of the least significant classifier does not result in a lower overall cost and then reinstating the last identified least significant classifier to the current set of optimal weak classifiers;

(m) determining if the number of weak classifiers in the current set of optimal weak classifiers equals the prescribed maximum number of weak classifiers or the last computed overall cost for the current set of optimal weak classifiers is less than the acceptable maximum cost; and

(n) whenever it is determined that the number of weak classifiers in the current set of optimal weak classifiers does not equal the prescribed maximum number of weak classifiers and the last computed overall cost for the current set of optimal weak classifiers exceeds the acceptable maximum cost, repeating actions (c) through (m) until it is determined that the number of weak classifiers in the current set of optimal weak classifiers does equal the prescribed maximum number of weak classifiers or the last computed overall cost for the current set of optimal weak classifiers becomes less than the maximum allowable cost, then outputting the sum of the individual weak classifiers as the trained strong classifier.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A novel statistical learning procedure that can be applied to many machine-learning applications is presented. Although this boosting learning procedure is described with respect to its applicability to face detection, it can be applied to speech recognition, text classification, image retrieval, document routing, online learning and medical diagnosis classification problems.

108 Citations

15 Claims

1. A computer-implemented process for using feature selection to obtain a strong classifier from a combination of weak classifiers, comprising using a computer to perform the following process actions:
- (a) inputting a set of training examples, a prescribed maximum number of weak classifiers, a cost function capable of measuring the overall cost, and an acceptable maximum cost;
  
  (b) computing a set of weak classifiers, each classifier being associated to a particular feature of the training examples,(c) determining which of the set of weak classifiers is the most significant classifier;
  
  (d) adding said most significant classifier to a current set of optimal weak classifiers;
  
  (e) determining which of the current set of optimal weak classifiers is the least significant classifier;
  
  (f) computing the overall cost for the current set of optimal weak classifiers using the cost function;
  
  (g) conditionally removing the least significant classifier for the current set of optimal weak classifiers;
  
  (h) computing the overall cost for the current set of optimal weak classifiers less the least significant classifier using the cost function;
  
  (i) determining whether the removal of the least significant classifier results in a lower overall cost;
  
  (j) whenever it is determined that the removal of the least significant classifier results in a lower overall cost, eliminating the least significant classifier;
  
  (k) recomputing each classifier in the current set of optimal weak classifiers associated with a feature added subsequent to the eliminated classifier while keeping the earlier optimal weak classifiers unchanged;
  
  (l) repeat actions (f) through (k) until it is determined the removal of the least significant classifier does not result in a lower overall cost and then reinstating the last identified least significant classifier to the current set of optimal weak classifiers;
  
  (m) determining if the number of weak classifiers in the current set of optimal weak classifiers equals the prescribed maximum number of weak classifiers or the last computed overall cost for the current set of optimal weak classifiers is less than the acceptable maximum cost; and
  
  (n) whenever it is determined that the number of weak classifiers in the current set of optimal weak classifiers does not equal the prescribed maximum number of weak classifiers and the last computed overall cost for the current set of optimal weak classifiers exceeds the acceptable maximum cost, repeating actions (c) through (m) until it is determined that the number of weak classifiers in the current set of optimal weak classifiers does equal the prescribed maximum number of weak classifiers or the last computed overall cost for the current set of optimal weak classifiers becomes less than the maximum allowable cost, then outputting the sum of the individual weak classifiers as the trained strong classifier.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The process of claim 1 wherein the process action of computing each classifier of a set of weak classifiers comprises the process action of deriving each classifier based on a histogram of a scalar value feature for face training examples and a histogram of a scalar value feature for non-face training examples.
  - 3. The process of claim 1 wherein the most significant classifier includes the feature that is the most likely to predict whether a training example matches the classification of a particular classifier.
  - 4. The process of claim 1 wherein the set of weak classifiers are designed to classify whether a training example is a face or non-face.
  - 5. The process of claim 1 wherein the set of weak classifiers is designed to classify a training example as a text type.
  - 6. The process of claim 1 wherein the set of weak classifiers is designed to classify a training example as a type of document.
  - 7. The process of claim 1 wherein the set of weak classifiers is designed to classify a training example as a speech pattern.
  - 8. The process of claim 1 wherein the set of weak classifiers is designed to classify a training example as a type of medical condition.
  - 9. The process of claim 1 wherein a weak classifier h_j*(x) is computed as $h_{j}^{*}$
    - (x)==12⁡
      
      [log⁢
      
      Pj(x⁢
      
      
      
      y=+1,w)Pj(x⁢
      
      
      
      y=-1,w)+log⁢
      
      P⁡
      
      (y=+1)P⁡
      
      (y=-1)]wherein the probability densities of a feature j for a sub-sample x of a training example is denoted by P_j(x|y=+1) for a sought pattern and P_j(x|y=−
      
      1) for a non-sought pattern and the normalized weights are denoted by w.
  - 10. The process of claim 9 wherein the probability density for a sought pattern and the probability density for a non-sought pattern can be estimated using the histograms resulting from weighted voting of the training examples.
  - 11. The process of claim 9 wherein the process action of determining which of the set of weak classifiers is the most significant classifier comprises defining the most significant classifier h_M(x) as, $h_{M}$
    - (x)=arg⁢
      
      ⁢
      
      minh*∈
      
      Hw*⁢
      
      ∑
      
      ⁢
      
      e-yi⁡
      
      [h⁡
      
      (xi)+h*⁡
      
      (xi)],wherein H_w*={h_j*(x)|∀
      
      _j}, h(x)=Σ
      
      _m=1^M−
      
      1h_m(x), and M is the total number of weak classifiers in the set of weak classifiers.
  - 12. The process of claim 9 wherein the process action of determining which of the set of weak classifiers is the least significant classifier comprises defining the least significant classifier h′
    - (x) as, h′
      
      =argmin_hε
      
      H_WJ(H_M−
      
      h) where H_Mdenotes the strong classifier built upon the current set H_Mof selected weak classifiers.
  - 13. The process of claim 1 wherein the process action of computing the overall cost comprises computing the overall cost J(h(x)) as J(h(x))=Σ
    - _ie^−
      
      yⁱ^h(hⁱ⁾wherein y=+1 for a sought pattern and y=−
      
      1 for a nonsought pattern and h(x_i) is a weak classifier in the set of weak classifiers.
  - 14. The process of claim 1 wherein outputting the sum of the individual weak classifiers as the trained strong classifier comprises outputting the sum H(x) as H(x)=sign[Σ
    - _m=1^Mh_m(x)] wherein M is the total number of weak classifiers in the set of weak classifiers h_m(x) is a weak classifier in the current set of weak classifiers.

15. A system for detecting a person'"'"'s face in an input image and identifying a face pose range into which the face pose exhibited by the detected face falls, the system comprising:
- a general purpose computing device; and
  
  a computer program comprising program modules executable by the computing device, wherein the computing device is directed by the program modules of the computer program to;
  
  create database comprising a plurality of training feature characterizations, each of which characterizes the face of a person at a known face pose or a non-face;
  
  train a plurality of detectors arranged in a pyramidal architecture to determine whether a portion of an input image depicts a person'"'"'s face having a face pose falling within a face pose range associated with one of the detectors using the training feature characterizations; and
  
  whereinsaid detectors using a greater number of feature characterizations are arranged at the bottom of the pyramid, andsaid detectors arranged to detect finer ranges of face pose are arranged at the bottom of the pyramid; and
  
  wherein the program module to train a plurality of detectors comprises sub-modules to,(a) input a set of training examples, a prescribed maximum number of weak classifiers, a cost function capable of measuring the overall cost, and an acceptable maximum cost;
  
  (b) compute a set of weak classifiers, each classifier being associated to a particular feature of the training examples,(c) determine which of the set of weak classifiers is the most significant classifier;
  
  (d) add said most significant classifier to a current set of optimal weak classifiers;
  
  (e) determine which of the current set of optimal weak classifiers is the least significant classifier;
  
  (f) compute the overall cost for the current set of optimal weak classifiers using the cost function;
  
  (g) conditionally remove the least significant classifier for the current set of optimal weak classifiers;
  
  (h) compute the overall cost for the current set of optimal weak classifiers less the least significant classifier using the cost function;
  
  (i) determine whether the removal of the least significant classifier results in a lower overall cost;
  
  (j) whenever it is determined that the removal of the least significant classifier results in a lower overall cost, eliminate the least significant classifier;
  
  (k) recompute each classifier in the current set of optimal weak classifiers associated with a feature added subsequent to the eliminated classifier while keeping the earlier optimal weak classifiers unchanged;
  
  (l) repeat actions (f) through (k) until it is determined the removal of the least significant classifier does not result in a lower overall cost and then reinstate the last identified least significant classifier to the current set of optimal weak classifiers;
  
  (m) determine if the number of weak classifiers in the current set of optimal weak classifiers equals the prescribed maximum number of weak classifiers or the last computed overall cost for the current set of optimal weak classifiers is less than the acceptable maximum cost; and
  
  (n) whenever it is determined that the number of weak classifiers in the current set of optimal weak classifiers does not equal the prescribed maximum number of weak classifiers and the last computed overall cost for the current set of optimal weak classifiers exceeds the acceptable maximum cost, repeat actions (c) through (m) until it is determined that the number of weak classifiers in the current set of optimal weak classifiers does equal the prescribed maximum number of weak classifiers or the last computed overall cost for the current set of optimal weak classifiers becomes less than the maximum allowable cost, then output the sum of the individual weak classifiers as the trained strong classifier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Zhigu Holdings Limited
Original Assignee
Microsoft Corporation
Inventors
Li, ZiQing, Zhang, ZhenQiu, Zhu, Long
Primary Examiner(s)
Ahmed, Samir
Assistant Examiner(s)
KIM, CHONG R

Application Number

US10/091,109
Publication Number

US 20030110147A1
Time in Patent Office

1,492 Days
Field of Search

382155-159, 706 45- 48, 700 47- 48
US Class Current

382/159
CPC Class Codes

G06F 18/2115   by evaluating different sub...

G06F 18/214   Generating training pattern...

G06F 18/24765   Rule-based classification

G06N 20/10   using kernel methods, e.g. ...

G06N 20/20   Ensemble learning

G06N 3/045   Combinations of networks

G06V 40/161   Detection; Localisation; No...

Method for boosting the performance of machine-learning classifiers

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

108 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method for boosting the performance of machine-learning classifiers

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

108 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links