×

Text-classification system and method

  • US 7,016,895 B2
  • Filed: 02/25/2003
  • Issued: 03/21/2006
  • Est. Priority Date: 07/05/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-executed method for classifying a target document in the form of a digitally encoded natural-language text into one or more of two or more different classes, comprising the steps of:

  • (a) for each of a plurality of terms composed of non-generic words and, optionally, proximately arranged word groups in the target document, selecting a term as a descriptive term if the term has an above-threshold selectivity value in at least one library of texts in a field, where the selectivity value of the term in the library of texts in the field is related to the frequency of occurrence of that the term in said library, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively,(b) determining for each of a plurality of sample texts, a match score related to the number of descriptive terms present in or derived from that the text that match those in the target document, where each of the plurality of sample texts has an associated classification identifier that identifies the one of more different classes to which that the text belongs,(c) selecting one or more of the sample texts having the highest match scores,(d) recording the one or more classification identifiers associated with the one or more sample texts having the highest match scores, and(e) associating the one or more classification identifiers from step (d) with the target document, thereby to classify the target document as belonging to one or more classes represented by at least one of the classification identifiers from step (d).

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×