Automatic training of character templates using a transcription and a two-dimensional image source model
First Claim
1. A method of operating a machine to train a set of character templates for use in a recognition system;
- the machine including a processor and a memory device for storing data;
the data stored in the memory device including instruction data which the processor executes to operate the machine;
the processor being connected to the memory device for accessing and executing the instruction data stored therein;
the method comprising;
operating the processor to receive a two-dimensional (2D) image source of glyph samples having a vertical dimension size larger than a single line;
each glyph sample being an image instance of a respective one of a plurality of characters in a character set;
the set of character templates being trained representing respective ones of the plurality of characters in the character set;
operating the processor to receive a transcription network in the form of a finite state network data structure indicating a transcription associated with the 2D image source of glyph samples;
the transcription including an ordered arrangement of transcription labels;
the transcription network indicating the ordered arrangement of the transcription labels in the transcription as at least one transcription path through the transcription network;
operating the processor to access a two-dimensional (2D) image source network in the form of a stochastic finite state network data structure, stored in the memory device of the machine;
the 2D image source network modeling as a grammar a spatial image structure of a set of 2D images, each including a plurality of glyphs;
a first one of the set of 2D images being modeled as at least one path through the 2D image source network that indicates an ideal image consistent with the spatial image structure of the first image;
the at least one path indicating path data items associated therewith and accessible by the processor;
the path data items indicating image positions and glyph labels paired therewith of respective ones of the plurality of glyphs included in the first image;
the 2D image source of glyph samples being one of the images included in the set of 2D images modeled by the 2D image source network;
operating the processor to merge the 2D image source network with the transcription network to produce a transcription-image network;
the transcription-image network being a modified form of the 2D image source network wherein, when the transcription is associated with the first image, the transcription-image network models the first image as at least one complete transcription-image path through the transcription-image network that indicates an ideal image consistent with the spatial image structure of the first image and that further indicates the path data items, the transcription-image path further indicating a sequence of message strings consistent with the ordered arrangement of the transcription labels indicated by the at least one transcription path through the transcription network;
operating the processor to perform a decoding operation on the 2D image source of glyph samples using the transcription-image network to produce at least one complete transcription-image path indicating an ideal image consistent with the spatial image structure of the 2D image source of glyph samples;
operating the processor to produce training samples using the path data items associated with the at least one complete transcription-image path;
each training sample including a 2D image position in the 2D image source of glyph samples indicating an image position therein and a glyph label paired therewith; and
operating the processor to produce the set of character templates using the training samples.
4 Assignments
0 Petitions
Accused Products
Abstract
A technique for automatically training a set of character templates using unsegmented training samples uses as input a two-dimensional (2D) image of characters, called glyphs, as the source of training samples, a transcription associated with the 2D image as a source of labels for the glyph samples, and an explicit, formal 2D image source model that models as a grammar the structural and functional features of a set of 2D images that may be used as the source of training data. The input transcription may be a literal transcription associated with the 2D input image, or it may be nonliteral, for example containing logical structure tags for document formatting, such as found in markup languages. The technique uses spatial positioning information about the 2D image modeled by the 2D image source model and uses labels in the transcription to determine labeled glyph positions in the 2D image that identify locations of glyph samples. The character templates are produced using the input 2D image and the labeled glyph positions without assigning pixels to glyph samples prior to training. In one implementation, the 2D image source model is a regular grammar having the form of a finite state transition network, and the transcription is also represented as a finite state network. The two networks are merged to produce a transcription-image network, which is used to decode the input 2D image to produce labeled glyph positions that identify training data samples in the 2D image. In one implementation of the template construction process, a pixel scoring technique is used to produce character templates contemporaneously from blocks of training data samples aligned at glyph positions.
167 Citations
32 Claims
-
1. A method of operating a machine to train a set of character templates for use in a recognition system;
- the machine including a processor and a memory device for storing data;
the data stored in the memory device including instruction data which the processor executes to operate the machine;
the processor being connected to the memory device for accessing and executing the instruction data stored therein;
the method comprising;operating the processor to receive a two-dimensional (2D) image source of glyph samples having a vertical dimension size larger than a single line;
each glyph sample being an image instance of a respective one of a plurality of characters in a character set;
the set of character templates being trained representing respective ones of the plurality of characters in the character set;operating the processor to receive a transcription network in the form of a finite state network data structure indicating a transcription associated with the 2D image source of glyph samples;
the transcription including an ordered arrangement of transcription labels;
the transcription network indicating the ordered arrangement of the transcription labels in the transcription as at least one transcription path through the transcription network;operating the processor to access a two-dimensional (2D) image source network in the form of a stochastic finite state network data structure, stored in the memory device of the machine;
the 2D image source network modeling as a grammar a spatial image structure of a set of 2D images, each including a plurality of glyphs;
a first one of the set of 2D images being modeled as at least one path through the 2D image source network that indicates an ideal image consistent with the spatial image structure of the first image;
the at least one path indicating path data items associated therewith and accessible by the processor;
the path data items indicating image positions and glyph labels paired therewith of respective ones of the plurality of glyphs included in the first image;
the 2D image source of glyph samples being one of the images included in the set of 2D images modeled by the 2D image source network;operating the processor to merge the 2D image source network with the transcription network to produce a transcription-image network;
the transcription-image network being a modified form of the 2D image source network wherein, when the transcription is associated with the first image, the transcription-image network models the first image as at least one complete transcription-image path through the transcription-image network that indicates an ideal image consistent with the spatial image structure of the first image and that further indicates the path data items, the transcription-image path further indicating a sequence of message strings consistent with the ordered arrangement of the transcription labels indicated by the at least one transcription path through the transcription network;operating the processor to perform a decoding operation on the 2D image source of glyph samples using the transcription-image network to produce at least one complete transcription-image path indicating an ideal image consistent with the spatial image structure of the 2D image source of glyph samples; operating the processor to produce training samples using the path data items associated with the at least one complete transcription-image path;
each training sample including a 2D image position in the 2D image source of glyph samples indicating an image position therein and a glyph label paired therewith; andoperating the processor to produce the set of character templates using the training samples. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- the machine including a processor and a memory device for storing data;
-
13. A method of operating a machine to train a set of character templates for use in a recognition system;
- the machine including a processor and a memory device for storing data;
the data stored in the memory device including instruction data which the processor executes to operate the machine;
the processor being connected to the memory device for accessing and executing the instruction data stored therein;
the method comprising;operating the processor to determine glyph sample pixel positions identifying respective ones of glyph samples occurring in a two-dimensional (2D) image source of glyph samples having a vertical dimension size larger than a single line of glyphs;
each glyph sample included in the 2D image source of glyph samples being an image instance of a respective one of a plurality of characters in a character set;
each one of the set of character templates being trained representing a respective one of the plurality of characters in the character set;the processor, in determining the glyph sample pixel position of each glyph sample, using a two-dimensional (2D) image source model that models as a grammar a spatial image structure of a set of images that includes the 2D image source of glyph samples;
the 2D image source model including spatial positioning data modeling spatial positioning of the plurality of glyphs occurring in the 2D image source of glyph samples;
the processor using the spatial positioning data to determine the glyph sample pixel position identifying a respective glyph sample;operating the processor to produce a glyph label to be respectively paired with the glyph sample pixel position of a respective glyph sample;
the respectively paired glyph label indicating a respective one of the characters in the character set;the processor, in producing the respectively paired glyph label, using mapping data included in the 2D image source model mapping a respective one of the glyphs to a glyph label indicating the character in the character set; the processor, further in producing the respectively paired glyph label, using a transcription associated with the 2D image source of glyph samples and including an ordered arrangement of transcription labels; and operating the processor to produce the set of character templates indicating respective ones of the characters in the character set using the glyph sample pixel positions identifying the glyph samples occurring in the 2D image source of glyph samples with their respectively paired glyph labels. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- the machine including a processor and a memory device for storing data;
-
26. A method of operating a machine to train a set of character templates for use in a recognition system;
- each of the character templates being based on a character template model defining character image positioning referred to as the sidebearing model of character image positioning;
the machine including a processor and a memory device for storing data;
the data stored in the memory device including instruction data which the processor executes to operate the machine;
the processor being connected to the memory device for accessing and executing the instruction data stored therein;
the method comprising;operating the processor to receive a two-dimensional (2D) image source of glyph samples;
the 2D image source of glyph samples having a vertical dimension size larger than a single line;
each glyph occurring in the 2D image source of glyph samples being an image instance of a respective one of a plurality of characters in a character set;
each one of the set of character templates being trained representing a respective one of the plurality of characters in the character set;operating the processor to access a two-dimensional (2D) image source model stored in the memory device of the machine;
the 2D image source model modeling as a grammar a set of two-dimensional (2D) images having a common spatial image structure;
the 2D image source of glyph samples being one of the set of 2D images modeled by the 2D image source model;
the 2D image source model including spatial positioning data modeling spatial positioning of the plurality of glyphs occurring in the 2D image source of glyph samples;operating the processor to determine, for each respective glyph occurring in the 2D image source of glyph samples, a glyph sample image origin position of the respective glyph therein using the spatial positioning data included in the 2D image source model; operating the processor to produce a glyph label respectively paired with each glyph sample image origin position;
each respectively paired glyph label indicating the character in the character set represented by the respective glyph;the processor, in producing each respectively paired glyph label, using mapping data included in the 2D image source model mapping respective ones of the glyphs occurring in the 2D image source of glyph samples to respectively paired glyph labels, each indicating the character in the character set represented by the respective glyph; the processor, further in producing each respectively paired glyph label, using a transcription associated with the 2D image source of glyph samples the transcription including an ordered arrangement of transcription labels;
the processor using the transcription labels and the mapping data to pair a glyph label with a respective glyph sample image origin position of a respective glyph occurring in the 2D image source of glyph samples; and
p1 operating the processor to produce the set of character templates indicating the characters in the character set using the 2D image source of glyph samples, the glyph sample image origin positions and the respectively paired glyph labels;
the processor determining, for each character template, a collection of sample image regions included in the 2D image source of glyph samples using the glyph sample image origin positions and the respectively paired glyph labels;
the process producing the set of character templates using the collections of sample image regions by assigning foreground pixel color values to selected template pixel positions in respective ones of the character templates;
one of the selected template pixel positions in a first one of the set of character templates being selected on the basis of template contribution measurements computed using sample pixel positions included in the collection of sample image regions for the character represented by the first character template;each character template having a characteristic image positioning property such that, when a second character template is positioned in an image with an image origin position thereof displaced from the image origin position of a preceding first character template by a character set width thereof, and when a fast bounding box entirely containing the first character template overlaps in the image with a second bounding box entirely containing the second character template, the first and second character templates have substantially nonoverlapping foreground pixels. - View Dependent Claims (27)
- each of the character templates being based on a character template model defining character image positioning referred to as the sidebearing model of character image positioning;
-
28. A machine for use in training a set of character templates for use in a recognition operation;
- the machine comprising;
a first signal source for providing image data defining a fast image; image input circuitry connected to the first signal source for receiving the image data defining the first image therefrom; a second signal source for providing non-image data; input circuitry connected to the second signal source for receiving the non-image data therefrom; a processor connected to the image input circuitry for receiving the image data defining the first image therefrom and further connected to the input circuitry for receiving the non-image data therefrom; and memory for storing data;
the data stored in the memory including instruction data indicating instructions the processor can execute;the processor being further connected to the memory for accessing the data stored therein; wherein the processor, in executing the instructions stored in the memory, receives from the image input circuitry a two-dimensional (2D) image source of glyph samples;
the 2D image source of glyph samples having a vertical dimension size larger than a single line of glyphs;
each glyph included in the 2D image source of glyph samples being an image instance of a respective one of a plurality of characters in a character set;
the set of character templates being trained representing respective ones of the plurality of characters in the character set;receives from the input circuitry a transcription associated with the 2D image source of glyph samples including an ordered arrangement of transcription labels; and receives from the input circuitry a two-dimensional (2D) image source model modeling as a grammar a spatial image structure of a set of 2D images including the 2D image source of glyph samples;
the 2D image source model including spatial positioning data indicating spatial positioning information about the plurality of glyphs occurring in the 2D image source of glyph samples;
the 2D image source model indicating mapping data mapping a respective one of the glyphs occurring in the 2D image source of glyph samples to a glyph label indicating a character in the character set;wherein the processor, further in executing the instructions stored in the memory, determines a glyph sample pixel position of each of a plurality of glyph samples occurring in the 2D image source of glyph samples using the spatial positioning information included in the 2D image source model; produces a glyph label respectively paired with each respective one of the glyph sample pixel positions and indicating a respective one of the characters in the character set;
the processor producing the respectively paired glyph label using the mapping data indicated by the 2D image source model and using the ordered arrangement of transcription labels included in the transcription; andproduces the set of character templates using the 2D image source of glyph samples and using the glyph sample pixel positions and the respectively paired glyph labels. - View Dependent Claims (29, 30, 31, 32)
- the machine comprising;
Specification