×

Method and apparatus for highlighting and categorizing documents using coded word tokens

  • US 5,526,443 A
  • Filed: 11/09/1995
  • Issued: 06/11/1996
  • Est. Priority Date: 10/06/1994
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for highlighting and categorizing images from a document using a sequence of word tokens representing words of the document, the word tokens comprising character shape code classes, each word of the document being represented by only one word token, the method comprising the steps of:

  • eliminating predetermined character shape code classes from said sequence of word tokens;

    removing predetermined common function word tokens from said sequence of word tokens to form a reduced sequence of word tokens using a pattern matching technique and a stop token list;

    determining word token frequency appearance rates for the word tokens of the reduced sequence;

    ranking said frequency of appearance rates;

    determining nth or more most frequently appearing word tokens based on the ranked frequency of appearance rates;

    highlighting words of the document corresponding to the nth or more most frequently appearing word tokens.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×