×

Fast character segmentation of skewed text lines for optical character recognition

  • US 5,172,422 A
  • Filed: 05/13/1991
  • Issued: 12/15/1992
  • Est. Priority Date: 05/13/1991
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for segmenting individual characters in a bit map of a document comprising successive rows of characters, said bit map comprising plural horizontal rows and vertical columns of "ON" and "OFF" pixels, in preparation for performing optical character recognition, said method comprising:

  • dividing said bit map into plural vertical blocks;

    determining from horizontal projects of each of said vertical blocks the initial top and bottom bounds by;

    a) constructing for each of said vertical blocks a top horizontal projection array listing for each character row in the vertical block the top pixel row thereof;

    b) constructing for each of said vertical blocks a bottom horizontal projection array listing for each character row in the vertical block the bottom pixel row thereof; and

    c) deriving from said top and bottom projection arrays the top and bottom pixel row of each character row in said vertical block;

    determining from horizontal projections of reach of said vertical blocks initial top and bottom bound of one character in a successive one of said character rows in a current one of said vertical blocks;

    finding a left boundary of said on character by searching from left to right within a restricted portion of said bit map between the initial top and bottom bounds of said one character for a first pixel column having a t least a first threshold number of "ON" pixels and defining said left boundary to be said first pixel column having at least said threshold number of "ON" pixels;

    finding a fright boundary of said one character by searching from left to right from said left boundary of said character within said restricted portion of said bit map between the initial top and bottom bounds of said on character for a first pixel column having no more than a second threshold number of "ON" pixels and defining said right boundary to be said first pixel column having no more than said second threshold number of "ON" pixels;

    finding a top boundary of said one character by searching down from said initial top bound of said one character within a restricted portion of said bit map between said left and right boundaries of said one character for a first pixel row containing at least said first threshold number of"ON" pixels; and

    defining as said initial top bound of said one character the highest of the top pixel rows of the corresponding character rows in adjacent vertical blocks to the left and right of si current vertical block containing said character; and

    finding a bottom boundary of said one character by searching up from said initial bottom bound of said one character within said restricted portion of said bit map between said left and right boundaries of said one character for a first pixel row containing at least said first threshold number of "ON" pixies.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×