×

Document categorization by word length distribution analysis

  • US 5,909,680 A
  • Filed: 09/09/1996
  • Issued: 06/01/1999
  • Est. Priority Date: 09/09/1996
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-implemented method for categorizing digitized documents comprising the steps of:

  • providing an electronic representation of an image of a document;

    developing word length distribution information of said image from said electronic representation wherein said word length distribution information includes a document feature vector characterizing said document, said document feature vector comprises elements representative of distribution of estimates of word lengths, said elements comprise conditional probabilities of words of A characters proximate to words of B characters, for a plurality of values of A and B; and

    categorizing said document responsive to said word length distribution information and word length distribution information for representative categories of documents.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×