×

Apparatus, method, and computer program for analyzing document layout

  • US 7,627,176 B2
  • Filed: 07/05/2005
  • Issued: 12/01/2009
  • Est. Priority Date: 03/04/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-readable medium storing a program for analyzing layout of text on a document image to extract text blocks for character recognition purposes, the program causing a computer to function as:

  • an extraction condition memory storing a plurality of extraction conditions for use in extracting text blocks from a given document image;

    a text block extractor to extract a first set of non-overlapping text blocks from the given document image in accordance with one of the extraction conditions stored in said extraction condition memory, the text block extractor to also extract a second set of non-overlapping text blocks from the same document image in a different way from the first set, in accordance with another of the extraction conditions; and

    a text block consolidator to produce a consolidated set of text blocks by performing character recognition on each text block extracted by said text block extractor, evaluating validity of each text block based on a result of the character recognition, creating a consolidation source set by finding a text block of the first set which overlaps with a text block of the second set, adding both of those text blocks to the consolidation source set, and repeating operations of finding a text block of the first and second sets which overlaps with any of the text blocks belonging to the consolidated set and adding the found text block to the consolidation source set, and selecting a most valid combination of non-overlapping text blocks from among the text blocks belonging to the consolidation source set, based on the validity of each text block that has been evaluated.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×