×

Method and apparatus for determining the frequency of words in a document without document image decoding

  • US 5,325,444 A
  • Filed: 10/29/1993
  • Issued: 06/28/1994
  • Est. Priority Date: 11/19/1991
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for determining a frequency of occurrence of word units in an electronic document image having words represented as an undecoded content, comprising the steps of:

  • segmenting the document image into word units without decoding the document image content, each word unit corresponding to a word in said document image;

    deriving a word shape representation of selected word units in the document image without detecting or identifying any characters making up the word corresponding to the selected word units;

    identifying equivalence classes of the selected word units in the document image by clustering the ones of the selected word units having similar word shape representations; and

    quantifying the word units in each equivalence class.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×