×

Method and system for calculating phrase-document importance

  • US 6,549,897 B1
  • Filed: 12/17/1998
  • Issued: 04/15/2003
  • Est. Priority Date: 10/09/1998
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method in a computer system for generating a weight for a phrase within one of a plurality of documents, each document having terms, the phrase having component terms, the method comprising:

  • for each term, providing a term frequency that represents the number of occurrences of that term in the plurality of documents;

    estimating a document frequency for the phrase based on an estimated phrase probability of the phrase, the document frequency being the number of the plurality of the documents that contain the phrase, the estimated phrase probability being an estimation of the probability that any phrase in documents that contain each component term is the phrase, the phrase probability being derived from term probabilities of the component terms, the term probability of a component term being a ratio of an average of the provided term frequencies for the component terms per document that contains that component term to an average number of terms per document;

    estimating a total phrase frequency for the phrase based on an average phrase frequency for the phrase times the estimated document frequency for the phrase, the average phrase frequency being derived from the phrase probability of the phrase and the average number of terms per document; and

    combining the estimated document frequency with the estimated total phrase frequency to generate the weight of the phrase.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×