×

LINE SEGMENTATION METHOD APPLICABLE TO DOCUMENT IMAGES CONTAINING HANDWRITING AND PRINTED TEXT CHARACTERS OR SKEWED TEXT LINES

  • US 20150063699A1
  • Filed: 08/30/2013
  • Published: 03/05/2015
  • Est. Priority Date: 08/30/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method for segmenting a binary document image containing multiple printed lines of text to obtain segmented lines of printed text, comprising:

  • (a) performing a connected component analysis on the document image to generate a plurality of connected components;

    (b) computing a bounding box and centroid for each of the plurality of connected components;

    (c) based on heights of the bounding boxes of the connected components, categorizing the plurality of connected components into three categories including small objects, regular text objects, and large objects;

    (d) performing cluster analysis on vertical positions of the centroids of the connected components in the category of regular text objects, using a number (N) of text lines in the document image as a number of cluster centers for the cluster analysis, to calculate N cluster centers which represent central vertical positions of the N text lines;

    (e) classifying each connected component obtained in step (a) as belonging to a text line based on vertical distances between the centroid of the connected component and the central vertical positions of the text lines calculated in step (d), and copying the connected component into one of N object boards designated for that text line, wherein each object board is a template having a size identical to a size of the document image, each object board being designated for one of the N lines of text of the document image; and

    (f) removing extra spaces in each of the N object boards to obtain N text line segments.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×