×

Document page analyzer and method

  • US 5,848,184 A
  • Filed: 06/30/1995
  • Issued: 12/08/1998
  • Est. Priority Date: 03/15/1993
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for analyzing a digital image of a document page, comprising the steps of:

  • detecting and removing one or more lines from the digital image, if any are present;

    segmenting the line-removed image into one or more coarse blocks;

    performing run-length smoothing on the coarse blocks;

    performing connected component analysis on each of the smoothed coarse blocks to produce at least one connected component;

    determining a bounding box for each of the connected components;

    merging any overlapping bounding boxes to produce one or more finer blocks;

    finding the block features of the digital image;

    finding the page features of the digital image;

    creating a geometric structure tree for the digital image;

    said step of creating further comprising;

    a) grouping the coarse blocks produced by said segmenting step into horizontal bands;

    b) first determining column boundaries of each of the horizontal bands;

    c) assigning finer blocks produced by the merging step to the horizontal bands;

    d) second determining a geometric structure tree for all of the finer blocks in each of the horizontal bands;

    e) merging the geometric structure trees producesd by the second determining step; and

    logically transforming the geometric structure to produce a rearranged page image arranged in a proper reading order.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×