×

Text categorization toolkit

  • US 6,212,532 B1
  • Filed: 10/22/1998
  • Issued: 04/03/2001
  • Est. Priority Date: 10/22/1998
  • Status: Expired due to Fees
First Claim
Patent Images

1. A module information extraction system capable of extracting information from natural language documents, the system including a plurality of interchangeable modules, the system comprising:

  • a data preparation module for preparing a first set of raw data having class labels to be tested, the data preparation module being selected from a first type of the interchangeable modules;

    a feature extraction module for extracting features from the raw data received from the data preparation module and storing the features in a vector format, the feature extraction module being selected from a second type of the interchangeable modules;

    a core classification module for applying a learning algorithm to the stored vector format and producing therefrom a resulting classifier, the core classification module being selected from a third type of the interchangeable modules; and

    a testing module for comparing the resulting classifier to a set of preassigned classes, the testing module being selected from a fourth type of the interchangeable modules, wherein the testing module tests a second set of raw data having class labels received by the data preparation module to determine whether the class labels of the second set of raw corresponds to the resulting classifier.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×