Method for building classifier models for event classes via phased rule induction

US 6,782,377 B2
Filed: 03/30/2001
Issued: 08/24/2004
Est. Priority Date: 03/30/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for predicting a target class within a dataset, the method steps comprising:

determining positive rules predicting the presence of a plurality of examples of the target class;

determining negative rules predicting the absence of the target class among the plurality of examples of the target class predicted to be present; and

applying a classifier model to the dataset for determining the presence the target class according to the positive rules and negative rules.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for learning signatures of a target class using a sequential covering phased rule-induction. The method balances recall and precision for the target class. A first phase aims for high recall by inducing rules with high support and a reasonable level of accuracy. A second phase improves the precision by learning rules to remove false positives in the collection of the records covered by the first phase rules, while keeping the overall recall at a desirable level. The method constructs a mechanism to assign prediction probability scores to each classification decision. The model includes a set of positive rules that predict presence of the target class, a set of negative rules that predict absence of the target class, and a set of prediction score values corresponding to each pair-wise combination of positive and negative rules. The two-phase method is extensible to a multiphase approach.

Citations

20 Claims

1. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for predicting a target class within a dataset, the method steps comprising:
- determining positive rules predicting the presence of a plurality of examples of the target class;
  
  determining negative rules predicting the absence of the target class among the plurality of examples of the target class predicted to be present; and
  
  applying a classifier model to the dataset for determining the presence the target class according to the positive rules and negative rules.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The program storage device of claim 1, further comprising the step of weighing an effect of each absence prediction on each presence prediction to recover at least one example of the target class predicted to be absent.
  - 3. The program storage device of claim 1, wherein the target class includes less than three percent of the examples in the dataset.
  - 4. The program storage device of claim 1, wherein the target class includes less than two percent of the examples in the dataset.
  - 5. The program storage device of claim 1, further comprising the step of evaluating he presence prediction and the absence prediction according to the equation:
6. The program storage device of claim 5, wherein the recall is a predefined portion of the examples in the target class.
7. The program storage device of claim 5, wherein the precision is a predefined fraction of correctly predicted target class examples among all predicted examples.
8. The program storage device of claim 1, wherein the presence prediction achieves at least a predefined recall and the absence prediction achieves at least a predefined precision.
9. The program storage device of claim 8, wherein achieving the desired precision includes the steps of:
- collecting the examples predicted by the presence prediction; and
  
  predicting a false positive example among the examples predicted by the presence prediction.

10. A method for learning a classifier model which determines examples of a target class in a dataset comprising the steps of:
- learning a plurality of positive rules supporting a plurality of examples of the target class;
  
  learning a plurality of negative rules removing a plurality of false positive examples among the examples supported by the positive rules;
  
  weighing an effect of each negative rule on each positive rule; and
  
  applying the classifier model to the dataset for determining the presence of the target class according to a plurality of positive rules, the plurality of negative rules, and the effect of each negative rule on each positive rule.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10, wherein the positive rules are learned iteratively, the positive rules satisfying a predefined criterion for a parameter of the classifier model.
  - 12. The method of claim 11, wherein the parameter is one of MinSupFractionP, MinCoverageP, and MinAccuraryP.
  - 13. The method of claim 10, wherein the step of learning the negative rules further comprises the steps of:
14. The method of claim 10, wherein a weight of each effect corresponds to a probability of a given supported example belonging to the target class.
15. The method of claim 10, wherein a negative rule/positive rule combination having a low weight is ignored by the classifier model.

16. A method for learning a classifier model which predicts the presence of a target class in a dataset comprising the steps of:
- learning a plurality of P-Rules supporting a plurality of examples of the target class;
  
  learning a plurality of N-Rules removing a plurality of false positive examples among the examples supported by the P-Rules;
  
  assigning a probabilistic score to each N-Rule/P-Rule combination; and
  
  applying the classifier model to the dataset for determining the presence of the target class according to a plurality of positive rules, the plurality of negative rules, and the probabilistic scores, wherein each probabilistic score is compared to a threshold to recover at least one example of the target class removed by the plurality of N-Rules.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16, wherein the step of learning P-Rules further comprises the steps of;
18. The method of claim 16, wherein the step of learning the N-Rules further comprises the steps of:
- determining a cost for an N-Rule; and
  
  comparing the cost to a predefined description length, upon determining that the cost is greater than the description length, ending the learning of N-Rules.
19. The method of claim 16, wherein the step of learning the P-Rules is via sequential covering.
20. The method of claim 16, wherein the step of learning the N-Rules is via sequential covering.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Joshi, Mahesh V., Agarwal, Ramesh
Primary Examiner(s)
Davis, George B.

Application Number

US09/823,140
Publication Number

US 20020184181A1
Time in Patent Office

1,243 Days
Field of Search

706/47, 706/21
US Class Current

706/47
CPC Class Codes

G06F 16/353 into predefined classes

Method for building classifier models for event classes via phased rule induction

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method for building classifier models for event classes via phased rule induction

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links