×

Polygon-based method for automatic extraction of selected text in a digitized document

  • US 5,048,099 A
  • Filed: 05/21/1990
  • Issued: 09/10/1991
  • Est. Priority Date: 05/21/1990
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of extracting marked text regions defined by hand drawn closed curves on a paper document, comprising the steps of:

  • raster scanning said document;

    generating as the document is being scanned, a bit map representation of the document such that whenever a transition is detected, contour vectorization is used including contour pixel tracing and piecewise linear approximation to convert the bit map representation into a collection of closed polygons formed by a series of vectors;

    separating the collection of polygons into inner and outer groups of contours, and the separation criteria for said groups is made by the summing the cross products given by the equation;

    ##EQU4## sorting all of the inner and outer contours according to location in their respective groups;

    determining segmentation points by substrating the lower coordinate (LYi) from the upper coordinate (UYi+1) and obtaining a positive value in each group of sorted contours such that overlapping contours are geometrically related to one another;

    establishing polygon blocks for all polygons having a geometrical relationship in the horizontal direction;

    use contour linking to examine the geometrical relationships of the coordinates of both the outer and inner contours of polygons to determine if they are geometrically overlapping;

    scanning the list of linked polygons to locate external inner contours of polygons; and

    extracting the inner contours of polygons formed by said hand drawn closed curves from said bit map representative document using the angular sum of the centerpoint of an outer countour.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×