×

Method and apparatus for complex column segmentation by major white region pattern matching

  • US 5,757,963 A
  • Filed: 06/07/1995
  • Issued: 05/26/1998
  • Est. Priority Date: 09/30/1994
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for logically identifying document elements in a complex column document image, comprising the steps of:

  • identifying major background regions in the document image;

    generating an ordered data string corresponding to a type and a location of the major background regions in the document image;

    comparing the ordered data string with a finite state machine to determine an optimal path from at least one candidate path for the ordered data string that best aligns with the finite state machine; and

    identifying a columnar layout based on the identified optimal path, wherein the step of identifying the columnar layout based on the identified optimal path comprises;

    selecting a current candidate path from the at least one candidate path;

    identifying editing costs in the current candidate path;

    correcting the current candidate path, wherein correcting the current candidate path comprises;

    deleting any insertions;

    inserting any deletions; and

    correcting any substitutions, wherein the step of inserting any deletions comprises;

    identifying matched major background regions in the document image;

    locating at least one missing major background region;

    selecting adjacent matched major background regions for each of the at least one missing major background region;

    determining a type of the at least one missing major background region based on the finite state machine;

    searching the document image for the at least one missing major background region using reduced threshold values in the background regions identifying step;

    adding the at least one missing major background region to the current candidate path if the at least one missing major background region is returned by the identifying major background regions step;

    selecting a next candidate path if the at least one missing major background region is not found and making a next best optimal path the current optimal path and repeating the identifying matched major background regions step through the selecting a next candidate path step until each of the at least one candidate path has been evaluated; and

    setting the optimal path to the current candidate path if the at least one missing major background region is returned;

    identifying major background regions based on the optimal path in the document image;

    replacing four major background region margins around the major background regions based on the optimal path; and

    selecting closed loops of the major background regions to identify at least one column of document elements in the columnar layout of the document image.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×