×

Automatic extraction of character ground truth data from images

  • US 8,755,595 B1
  • Filed: 07/19/2011
  • Issued: 06/17/2014
  • Est. Priority Date: 07/19/2011
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method comprising:

  • rendering a transcription in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes, wherein the transcription is provided as part of coarse ground truth data associated with a corresponding image, and wherein the transcription includes a textual translation of at least a portion of the image;

    selecting a word template from the set of candidate word templates, wherein the selected word template corresponds to a word patch from the image;

    evaluating the character bounding boxes, of the selected word template, in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates, including orienting the character bounding boxes within an allowable orientation range for the character bounding boxes;

    selecting, for each respective character from the word patch, a character template from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch; and

    providing character ground truth data comprising the selected character templates oriented to correspond to the word patch, as training data for recognizing the characters of the word patch from the image.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×