×

Method of optical character recognition using feature recognition and baseline estimation

  • US 7,454,063 B1
  • Filed: 09/22/2005
  • Issued: 11/18/2008
  • Est. Priority Date: 09/22/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of optical character recognition, comprising the steps of:

  • a) receiving an image with text;

    b) identifying all locations of words in the text;

    c) bounding each word in a bounding box;

    d) identifying each line in the text;

    e) grouping the bounding boxes by each line of text identified in step (d);

    f) calculating a directional vertical derivative of a pixellation density function of the words contained in each bounding box;

    g) identifying the highest value point in the directional vertical derivative of the pixellation density function for each bounding box;

    h) identifying the bottom points of vertical lines intersecting the highest value point in the pixellation density function for each bounding box, wherein each vertical line terminates at the base of the crest that includes the highest value point;

    i) approximating a first baseline for each word, each first baseline intersecting a bottom point;

    j) calculating a median anticipated baseline for each word, wherein calculating the median anticipated baseline comprises extending each baseline approximated in step (i) a user-definable fixed distance past the final bounding box on each line of text, identifying an endpoint of each approximated baseline, calculating the statistical median of all endpoints, and approximating a median anticipated baseline through the calculated median endpoints;

    k) verifying the first baseline for each word is the standard baseline;

    l) determining the standard baseline if the difference between the first baseline approximated in step (i) for each bounding box and the median anticipated baseline approximated in step (j) is greater than a user-definable number of pixels in the y-direction on an x-y plane for each bounding box;

    m) determining the thickness of the standard baseline for each word;

    n) parsing each word into regions in which a feature may exist;

    o) identifying all vertical strokes in the regions identified in step (n);

    p) identifying the high value vertical peaks in each region identified in step (o);

    q) identifying low value vertical peaks in each region (identified in step (o);

    r) performing at least one of a statistical analysis, a physical analysis, a geometric analysis, and a linguistic analysis on at least one region identified in step (n) to identify all feature regions not identified in steps (o) and (p);

    s) locating and identifying ornaments;

    t) associating each ornament with its corresponding feature;

    u) identifying characters by comparing the identified feature and associated ornaments to a user-definable database; and

    v) displaying the results of step (u) in a user-definable format.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×