×

Automatic method of generating thematic summaries from a document image without performing character recognition

  • US 5,848,191 A
  • Filed: 12/14/1995
  • Issued: 12/08/1998
  • Est. Priority Date: 12/14/1995
  • Status: Expired due to Term
First Claim
Patent Images

1. A processor implemented method of generating a thematic summary of a document image without performing character recognition using a processor, the document including a first multiplicity of sentences and a second multiplicity of word occurrences, the processor implementing the method by executing instructions stored in electronic form in a memory coupled to the processor, the processor implemented method comprising the steps of:

  • a) analyzing the document image to identify sentence boundaries;

    b) analyzing the document image to identify a plurality of word image equivalence classes, each word image equivalence class including at least one word occurrence of the second multiplicity of word occurrences;

    c) selecting as thematic word images a first number of word image equivalence classes, the first number being less than a second number of thematic sentences to be extracted;

    d) scoring each sentence of the first multiplicity of sentences based upon occurrence of thematic word images in each sentence; and

    e) selecting the second number of thematic sentences from the first multiplicity of sentences based upon the score of each sentence.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×