Automatic extraction of character ground truth data from images
First Claim
1. A computer-implemented method comprising:
- rendering a transcription in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes, wherein the transcription is provided as part of coarse ground truth data associated with a corresponding image, and wherein the transcription includes a textual translation of at least a portion of the image;
selecting a word template from the set of candidate word templates, wherein the selected word template corresponds to a word patch from the image;
evaluating the character bounding boxes, of the selected word template, in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates, including orienting the character bounding boxes within an allowable orientation range for the character bounding boxes;
selecting, for each respective character from the word patch, a character template from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch; and
providing character ground truth data comprising the selected character templates oriented to correspond to the word patch, as training data for recognizing the characters of the word patch from the image.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments for automatic extraction of character ground truth data from images are disclosed. A transcription may be rendered in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes. A word template may be selected from the set of candidate word templates, wherein the selected word template corresponds to a word patch from an image. The character bounding boxes, of the selected word template, may be evaluated in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates. For each respective character from the word patch, a character template may be selected from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch. Character ground truth data comprising the selected character templates oriented to correspond to the word patch, may be provided as training data for recognizing the characters of the word patch from the image.
18 Citations
19 Claims
-
1. A computer-implemented method comprising:
-
rendering a transcription in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes, wherein the transcription is provided as part of coarse ground truth data associated with a corresponding image, and wherein the transcription includes a textual translation of at least a portion of the image; selecting a word template from the set of candidate word templates, wherein the selected word template corresponds to a word patch from the image; evaluating the character bounding boxes, of the selected word template, in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates, including orienting the character bounding boxes within an allowable orientation range for the character bounding boxes; selecting, for each respective character from the word patch, a character template from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch; and providing character ground truth data comprising the selected character templates oriented to correspond to the word patch, as training data for recognizing the characters of the word patch from the image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
one or more processors; a memory coupled to the one or more processors; a template generator configured to; render a transcription in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes, wherein the transcription is provided as part of coarse ground truth data associated with a corresponding image, and wherein the transcription includes a textual translation of at least a portion of the image, and select a word template from the set of candidate word templates, wherein the selected word template corresponds to a word patch from the image; a character box refiner configured to; evaluate the character bounding boxes, of the selected word template, in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates, including orienting the character bounding boxes within an allowable orientation range for the character bounding boxes, and select, for each respective character from the word patch, a character template from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch; and an output formatter configured to; provide character ground truth data comprising the selected character templates oriented to correspond to the word patch, as training data for recognizing the characters of the word patch from the image. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. An apparatus comprising a non-transitory computer readable medium encoding instructions thereon that, when executed by a processor, cause the processor to perform operations comprising:
-
rendering a transcription in a plurality of fonts and orientations to obtain a set of candidate word templates with associated character bounding boxes, wherein the transcription is provided as part of coarse ground truth data associated with a corresponding image, and wherein the transcription includes a textual translation of at least a portion of the image; selecting a word template from the set of candidate word templates, wherein the selected word template corresponds to a word patch from the image; evaluating the character bounding boxes, of the selected word template, in a plurality of orientations about each respective character from the word patch to obtain a set of candidate character templates, including orienting the character bounding boxes within an allowable orientation range for the character bounding boxes; selecting for each respective character from the word patch, a character template from the set of candidate character templates, wherein each selected character template corresponds to the respective character from the word patch; and providing character ground truth data comprising the selected character templates oriented to correspond to the word patch, as training data for recognizing the characters of the word patch from the image. - View Dependent Claims (19)
-
Specification