×

Method and apparatus for automatically identifying keywords within a document

  • US 6,470,307 B1
  • Filed: 06/23/1997
  • Issued: 10/22/2002
  • Est. Priority Date: 06/23/1997
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of generating a plurality of human intelligible keywords from an electronic, stored document including phrases, stop words delimiting the phrases, and punctuation, the method comprising the steps of:

  • a) providing features selected to be indicative of word/phrase significance, providing a training document and a set of human intelligible keywords dependent upon the training document, and producing training results in dependence upon the document and the human intelligible keywords, the training results including parameter values indicative of feature weighting for weighting the provided features in order to determine a measure of word/phrase significance;

    b) using a computer to select from the document raw phrases comprised of one or more contiguous words excluding stop words, by utilizing slop words, or stop words and punctuation, to determine raw phrases to be selected; and

    , c) using a form or the raw phrases, generating the plurality of human intelligible keywords by evaluating the selected raw phrases based on the provided features and the parameter values, wherein the step of selecting raw phrases is performed in dependence upon the training results and in the absence of part-of-speech tagging and a lexicon of target human intelligible keywords.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×