×

Decision-tree-based symbolic rule induction system for text categorization

  • US 6,519,580 B1
  • Filed: 06/08/2000
  • Issued: 02/11/2003
  • Est. Priority Date: 06/08/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer implemented method for generating decision-tree-based symbolic rule induction for general text categorization, wherein the text categorization provides superior performance in the areas of precision, recall multiple categorization, provision of confidence levels, training speed, and insight and control, said method comprising the steps of:

  • accepting training data, the data comprising a set of training documents, wherein a document is any data item containing text;

    creating a set TR of representations of the set of training documents, the representation being suitable for rule induction and being in terms of counts of occurrences of features in documents;

    for each category C, generating R(C), a decision-tree-based set of symbolic rules;

    combining the generated rule sets R(C) for all categories C into a single rule set, “

    RuleSet”

    ;

    computing a confidence level for each rule in RuleSet;

    adding the computed confidence level to the corresponding rule; and

    generating a final RuleSet comprising rules and corresponding confidence levels.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×