×

Document classification method and apparatus therefor

  • US 6,094,653 A
  • Filed: 12/29/1997
  • Issued: 07/25/2000
  • Est. Priority Date: 12/25/1996
  • Status: Expired due to Fees
First Claim
Patent Images

1. A document classification system comprising:

  • a category memory section storing document categories;

    a word cluster distribution memory section storing word cluster distributions for each of the document categories;

    a word distribution memory section storing classification words and classification word distributions for each of the word clusters;

    a learning section connected to each said memory section that prepares the word cluster distributions in each of the document categories and provides the word cluster distributions to said word cluster distribution memory section, and that prepares the word distributions in each of the word clusters and provides the word distributions to said word distribution memory section; and

    a document classification section that classifies a document based on linear combination models, there being one of the linear combination models for each of the document categories, each of the linear combination models linearly combining a respective one of the word distributions times a respective one of the word cluster distributions for each of the classification words in the document and has the form;

    ##EQU9## where P(W|c) is a probability that the document W is in the document category c, P(W|ki) is a probability of appearance of the classification word w in the word cluster ki, P(ki |c) is a probability of appearance of the word cluster ki in the document category c, and n is a number of the classification words in the document W.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×