×

Method and system for preprocessing an image for optical character recognition

  • US 8,194,983 B2
  • Filed: 05/13/2010
  • Issued: 06/05/2012
  • Est. Priority Date: 05/13/2010
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of preprocessing an image for optical character recognition (OCR), wherein the image comprises Arabic text and non-text items, the method comprising:

  • determining a plurality of components associated with at least one of the Arabic text and the non-text items, wherein a component comprises a set of connected pixels;

    calculating a first set of characteristic parameters associated with the plurality of components; and

    merging the plurality of components based on the first set of characteristic parameters to form at least one of at least one sub-word and at least one word;

    calculating a second set of characteristic parameters associated with the at least one of each sub-word and each word, wherein the second set of characteristic parameters is one of a line height, a word spacing, and a line spacing;

    grouping at least two sub-words based on the second set of characteristic parameters to form one of at least one sub-word and at least one word;

    segmenting the at least one sub word and the at least one word into at least one horizontal line based on at least one of a line height and a line spacing;

    identifying at least one component associated with the at least one horizontal line comprising a height greater than a factor of the line height;

    determining a center of each horizontal line of the at least one horizontal line, wherein the center is a mid point between a top edge and a bottom edge of each horizontal line;

    calculating a distance between at least one of the center and the top edge, and the center and the bottom edge; and

    determining orientation of the image based on the distance.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×