Automatic extraction of character ground truth data from images

US 8,755,595 B1
Filed: 07/19/2011
Issued: 06/17/2014
Est. Priority Date: 07/19/2011
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method comprising:

rendering a transcription in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes, wherein the transcription is provided as part of coarse ground truth data associated with a corresponding image, and wherein the transcription includes a textual translation of at least a portion of the image;

selecting a word template from the set of candidate word templates, wherein the selected word template corresponds to a word patch from the image;

evaluating the character bounding boxes, of the selected word template, in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates, including orienting the character bounding boxes within an allowable orientation range for the character bounding boxes;

selecting, for each respective character from the word patch, a character template from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch; and

providing character ground truth data comprising the selected character templates oriented to correspond to the word patch, as training data for recognizing the characters of the word patch from the image.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments for automatic extraction of character ground truth data from images are disclosed. A transcription may be rendered in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes. A word template may be selected from the set of candidate word templates, wherein the selected word template corresponds to a word patch from an image. The character bounding boxes, of the selected word template, may be evaluated in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates. For each respective character from the word patch, a character template may be selected from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch. Character ground truth data comprising the selected character templates oriented to correspond to the word patch, may be provided as training data for recognizing the characters of the word patch from the image.

18 Citations

View as Search Results

19 Claims

1. A computer-implemented method comprising:
- rendering a transcription in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes, wherein the transcription is provided as part of coarse ground truth data associated with a corresponding image, and wherein the transcription includes a textual translation of at least a portion of the image;
  
  selecting a word template from the set of candidate word templates, wherein the selected word template corresponds to a word patch from the image;
  
  evaluating the character bounding boxes, of the selected word template, in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates, including orienting the character bounding boxes within an allowable orientation range for the character bounding boxes;
  
  selecting, for each respective character from the word patch, a character template from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch; and
  
  providing character ground truth data comprising the selected character templates oriented to correspond to the word patch, as training data for recognizing the characters of the word patch from the image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 further comprising:
    - receiving the course ground truth data comprising the image and a transcription of the word patch, wherein the image includes a word bounding box identifying a location of the word patch in the image.
  - 3. The method of claim 2 further comprising:
    - cropping the word patch from the image based on the word bounding box;
      
      converting the cropped word patch to grayscale; and
      
      scaling the grayscaled word patch to a common height.
  - 4. The method of claim 1 wherein the rendering comprises:
    - for each font of the plurality of fonts;
      
      rendering the transcription in a first font of the plurality of fonts; and
      
      evaluating the rendered transcription in a plurality of orientations to determine an orientation of the rendered transcription that most closely correlates to the word patch.
  - 5. The method of claim 4 wherein the evaluating the rendered transcription comprises:
    - determining a correlation score for each orientation of the rendered transcription, wherein the correlation score comprises an indication of how closely the orientation of the rendered transcription correlates to the word patch; and
      
      selecting the orientation with the highest correlation score as one of the candidate word templates.
  - 6. The method of claim 5, wherein the selecting a word template comprises:
    - selecting the word template from the set of candidate word templates, wherein the selected word template has a highest correlation score amongst the correlation scores for the candidate word templates.
  - 7. The method of claim 1 wherein the selecting comprises:
    - determining a correlation score for each orientation of the character bounding boxes, wherein the correlation score comprises an indication of how closely the oriented character bounding box correlates to the respective character of the word patch; and
      
      selecting the oriented character bounding box with the highest correlation score for the character template.
  - 8. The method of claim 1 wherein the providing comprises:
    - formatting the character templates based on the selected word template for output as the character ground truth data; and
      
      providing the character ground truth data for training an optical character recognition (OCR) engine to recognize the word patch from the image.
  - 9. The method of claim 1 wherein the providing comprises:
    - making a determination as to whether the character ground truth data exceeds a threshold score, wherein the threshold score indicates a minimum allowable level of correlation between the character ground truth data and the word patch of the image;
      
      providing, when the threshold score is exceeded, the character ground truth data as the training data; and
      
      rejecting, when the threshold score is not exceeded, the character ground truth data.
  - 10. The method of claim 1 further comprising:
    - training an OCR engine to recognize word patches from a plurality of images based on the training data.

11. A system comprising:
- one or more processors;
  
  a memory coupled to the one or more processors;
  
  a template generator configured to;
  
  render a transcription in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes, wherein the transcription is provided as part of coarse ground truth data associated with a corresponding image, and wherein the transcription includes a textual translation of at least a portion of the image, andselect a word template from the set of candidate word templates, wherein the selected word template corresponds to a word patch from the image;
  
  a character box refiner configured to;
  
  evaluate the character bounding boxes, of the selected word template, in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates, including orienting the character bounding boxes within an allowable orientation range for the character bounding boxes, andselect, for each respective character from the word patch, a character template from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch;
  
  andan output formatter configured to;
  
  provide character ground truth data comprising the selected character templates oriented to correspond to the word patch, as training data for recognizing the characters of the word patch from the image.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The system of claim 11 further comprising:
    - an input formatter implemented on the processors and configured to extract the word patch from the image based on a word bounding box indicating the location of the word patch within the image.
  - 13. The system of claim 11 wherein the template generator is configured to:
    - score the orientation of each character bounding box such that the score indicates a level of correlation between the oriented character bounding box and the corresponding character from the word patch; and
      
      select, for each character template, the oriented character bounding box such that the score for the selected oriented character bounding box is greater than a score for any other orientation of the same character bounding box.
  - 14. The system of claim 12 wherein the template generator is configured to:
    - score the candidate word templates, for each of the plurality of fonts, at one or more of the plurality of orientations; and
      
      select the candidate word template with a highest score, based on the font and orientation of the candidate word template, as the selected word template.
  - 15. The system of claim 12 further comprising:
    - a training module configured to train an optical character recognition (OCR) engine to recognize the word patch from the image based on the training data.
  - 16. The system of claim 15 wherein the output formatter is configured to:
    - determine a score based on a comparison of the character ground truth data to the word patch from the image; and
      
      provide the character ground truth data to the training module if the score exceeds a threshold of correlation.
  - 17. The system of claim 15 wherein the output formatter is configured to:
    - perform a consistency check to determine a consistency between the character ground truth data corresponding to a first word patch from the image and the character ground truth data corresponding to a second word patch from the image.

18. An apparatus comprising a non-transitory computer readable medium encoding instructions thereon that, when executed by a processor, cause the processor to perform operations comprising:
- rendering a transcription in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes, wherein the transcription is provided as part of coarse ground truth data associated with a corresponding image, and wherein the transcription includes a textual translation of at least a portion of the image;
  
  selecting a word template from the set of candidate word templates, wherein the selected word template corresponds to a word patch from the image;
  
  evaluating the character bounding boxes, of the selected word template, in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates, including orienting the character bounding boxes within an allowable orientation range for the character bounding boxes;
  
  selecting for each respective character from the word patch, a character template from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch; and
  
  providing character ground truth data comprising the selected character templates oriented to correspond to the word patch, as training data for recognizing the characters of the word patch from the image.
- View Dependent Claims (19)
- - 19. The apparatus of claim 18 wherein selecting a word template from the set of candidate word templates is based on a highest score from amongst a plurality of scores for each of the candidate word templates, wherein each score indicates a level of correlation between the font and orientation with word patch.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Bissacco, Alessandro, Chaudhury, Krishnendu
Primary Examiner(s)
Ahmed, Samir
Assistant Examiner(s)
LE, TOTAM HA

Application Number

US13/186,173
Time in Patent Office

1,064 Days
Field of Search

382/159
US Class Current

382/159
CPC Class Codes

G06F 18/214 Generating training pattern...

Automatic extraction of character ground truth data from images

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic extraction of character ground truth data from images

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links