×

Method and apparatus for determining the frequency of phrases in a document without document image decoding

  • US 5,369,714 A
  • Filed: 11/19/1991
  • Issued: 11/29/1994
  • Est. Priority Date: 11/19/1991
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for determining a frequency of occurrence of significant word sequences in an undecoded electronic document text image, comprising the steps of:

  • segmenting the document image into word units;

    determining at least one significant morphological image characteristic of selected word units in the document image;

    identifying equivalence classes of the selected word units in the document image by clustering the ones of the selected word units with similar morphological image characteristics, each equivalence class being assigned a label;

    equating the equivalence class labels to said selected word nits arranged in the order in which the selected word units appear in the document image to form a master-sequence of equivalence class labels, said master-sequence including the equivalence class labels of the selected word units in the document image arranged in the order in which the selected word units appear in the document image, said master-sequence being comprised of sub-sequences;

    evaluating said equivalence class label sub-sequences to determine the frequency of each equivalence class label sub-sequence, andoutputting to an optical or electrical output device a list of significant phrases corresponding to the equivalence class label sub-sequences without having determined their content beyond the at least one significant morphological image characteristic.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×