×

Test classification system and method

  • US 6,137,911 A
  • Filed: 06/16/1997
  • Issued: 10/24/2000
  • Est. Priority Date: 06/16/1997
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of automatically classifying a text entity which comprises a plurality of terms into one or more clusters of a plurality of clusters which characterize a corpus of text in corresponding subject areas, each cluster having a plurality of text entities related to a particular corresponding subject area, the method comprising forming a list of terms sorted by order of occurrence from the corpus;

  • determining, for each of the clusters, a value of statistical weight of significance of terms of the list in said each cluster by examining distributions of the terms inside of the cluster and outside of the cluster, said determining comprising calculating a weight of significance of terns in said each cluster, and assigning a weight of zero to terms which are not statistically significant in said each cluster;

    constructing a vector for each cluster, the vector having element values corresponding to the weights of significance of the terms in the cluster;

    calculating for each cluster from its corresponding vector statistical signatures of the cluster;

    determining from the statistical signatures a score for the text entity for each cluster indicating the relevance of the text entity to the cluster; and

    classifying the text entity into one or more clusters based upon said scores.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×