Document image decoding using text line column-based heuristic scoring
First Claim
1. A method for operating a processor-controlled machine to decode a text line image;
- the machine including a processor and a memory device for storing data;
the data stored in the memory device including instruction data the processor executes to operate the machine;
the processor being connected to the memory device for accessing and executing the instruction data stored therein;
the method comprising;
receiving an input text line image indicating a bitmapped image region including a plurality of image glyphs each indicating a character symbol;
obtaining a plurality of character templates and character labels stored in the memory device of the machine;
each character template indicating a two-dimensional bitmapped image of a character symbol;
each character template further indicating a character label identifying the character symbol represented by the character template;
producing a one-dimensional (1D) image analogue data structure using pixel counts of image foreground pixels in columns of the image portion of the input text line image;
producing a plurality of 1D template analogue data structures using pixel counts of template foreground pixels in columns of a respective one of the plurality of character templates;
computing a plurality of template-image heuristic scores using the 1D image analogue data structure and the plurality of 1D template analogue data structures;
each template-image heuristic score indicating an estimated measurement of a match between one of the plurality of character templates and a two-dimensional region of the image portion of the input text line image; and
performing a dynamic programming operation using a decoding trellis data structure indicating a stochastic finite state network including nodes and transitions between nodes indicating a model of expected spatial arrangements of character symbols in the input text line image;
the dynamic programming operation using the plurality of template-image heuristic scores to decode the input text line image and produce the character labels of the character symbols represented by the image glyphs included therein.
5 Assignments
0 Petitions
Accused Products
Abstract
In a text recognition system that uses a stochastic finite state network to model a document image layout, the computational efficiency of text line decoding is improved. In a typical implementation, the dynamic programming operation that accomplishes decoding uses actual scores computed between two-dimensional (2D) bitmapped character template images and the (2D) bitmapped observed image. Scoring measures the degree of a match between a character template and the observed image. Computation of these actual scores is replaced with the simpler computation of column-based (i.e., one-dimensional) heuristic scores. Because the column-based heuristic scores can be shown to be a true upper bound on actual template-image scores, the heuristic scores are accurate enough to use in place of actual scoring during text line decoding. The heuristic scores essentially reduce the expensive two-dimensional computation of the actual template-image scores required by prior decoding methods to a simpler but accurate one-dimensional computation.
-
Citations
10 Claims
-
1. A method for operating a processor-controlled machine to decode a text line image;
- the machine including a processor and a memory device for storing data;
the data stored in the memory device including instruction data the processor executes to operate the machine;
the processor being connected to the memory device for accessing and executing the instruction data stored therein;
the method comprising;receiving an input text line image indicating a bitmapped image region including a plurality of image glyphs each indicating a character symbol;
obtaining a plurality of character templates and character labels stored in the memory device of the machine;
each character template indicating a two-dimensional bitmapped image of a character symbol;
each character template further indicating a character label identifying the character symbol represented by the character template;
producing a one-dimensional (1D) image analogue data structure using pixel counts of image foreground pixels in columns of the image portion of the input text line image;
producing a plurality of 1D template analogue data structures using pixel counts of template foreground pixels in columns of a respective one of the plurality of character templates;
computing a plurality of template-image heuristic scores using the 1D image analogue data structure and the plurality of 1D template analogue data structures;
each template-image heuristic score indicating an estimated measurement of a match between one of the plurality of character templates and a two-dimensional region of the image portion of the input text line image; and
performing a dynamic programming operation using a decoding trellis data structure indicating a stochastic finite state network including nodes and transitions between nodes indicating a model of expected spatial arrangements of character symbols in the input text line image;
the dynamic programming operation using the plurality of template-image heuristic scores to decode the input text line image and produce the character labels of the character symbols represented by the image glyphs included therein.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
producing the plurality of 1D template analogue data structures includes producing a 1D template pixel count data structure for each character template; - each 1D template pixel count data structure indicating a plurality of counts of template column foreground pixels in respective columns in the character template;
producing the 1D image analogue data structure includes producing a 1D image pixel count data structure including a plurality of counts of image column foreground pixels in respective columns in the image portion; and
computing each one of the plurality of template-image heuristic scores includes computing a sum of a plurality of minimum pixel counts;
each minimum pixel count being the minimum of the count of the template column foreground pixels in a column of the character template and the count of the image column foreground pixels in a column of the image portion.
- the machine including a processor and a memory device for storing data;
-
3. The method of claim 1 for operating a processor-controlled machine to decode a text line image wherein
each character template includes at least two foreground template levels and a background template level; - each foreground template level including a plurality of foreground pixels and indicating a template level weighting factor associated with the foreground template level;
each of the plurality of 1D template analogue data structures includes, for each of the foreground template levels of the character template, information indicating a foreground pixel template level column count for a respective one of the columns thereof weighted by the template level weighting factor associated therewith;
the 1D image analogue data structure is a 1D image pixel count data structure indicating a plurality of counts of image foreground pixels in respective columns of the image portion; and
computing each one of the plurality of template-image heuristic scores includes computing a sum of a plurality of column pixel counts using the 1D image pixel count data structure and the 1D template analogue data structure.
- each foreground template level including a plurality of foreground pixels and indicating a template level weighting factor associated with the foreground template level;
-
4. The method of claim 3 wherein producing each of the plurality of 1D template analogue data structures includes producing a plurality of 1D template column lookup tables;
- each 1D template column lookup table representing data about at least one column of template pixels at all template levels of the character template;
producing the plurality of 1D template column lookup tables includingselecting a plurality of template columns;
one column in each template level having a column location in common with all other template columns;
computing a template foreground pixel count of foreground pixels at each template level in the plurality of selected template columns;
for each image foreground pixel count of image foreground pixels, assigning the image foreground pixels to foreground template levels according to the template foreground pixel count at each template level, in decreasing order of the weighting factor associated with the template level until all image foreground pixels have been assigned;
multiplying the weighting factor associated with each template level by a number of foreground pixels assigned to that template level to produce a template level image score; and
summing all template level image scores to produce an image pixel lookup score for the image foreground pixel count; and
wherein computing each one of the plurality of template-image heuristic scores includes for a plurality of consecutive entries in the 1D image pixel count data structure each indicating a count of image foreground pixels, using the count of image foreground pixels to retrieve the image pixel lookup score from the plurality of 1D template column lookup tables; and
summing the image pixel lookup scores to produce the template-image heuristic score.
- each 1D template column lookup table representing data about at least one column of template pixels at all template levels of the character template;
-
5. The method of claim 1 for operating a processor-controlled machine to decode a text line image wherein
producing the plurality of 1D template analogue data structures includes producing a 1D template pixel sums data structure for each character template; - each 1D template pixel sums data structure indicating a plurality of combined template column counts of template column foreground pixels;
each combined template column count indicating a count of foreground pixels in groups of at least two consecutive columns of the character template;
producing the 1D image analogue data structure includes producing a 1D image pixel sums data structure including a plurality of combined image column counts of image column foreground pixels;
each combined image column count indicating a count of foreground pixels in groups of at least two adjacent column of pixels in the image portion such that the 1D image pixel sums data structure includes a combined image column count for every column of pixels in the image portion; and
computing each of the plurality of template-image heuristic scores includes determining, for each combined template column count in the 1D template pixel sums data structure, a minimum of the combined template column count of the template column foreground pixels and the combined image column count of the image column foreground pixels; and
summing the minima to produce the template-image heuristic score;
wherein computing the plurality of template-image heuristic scores includes computing one template-image heuristic score for each 1D template pixel sums data structure at each column position in the image portion.
- each 1D template pixel sums data structure indicating a plurality of combined template column counts of template column foreground pixels;
-
6. The method of claim 1 for operating a processor-controlled machine to decode a text line image wherein
producing the plurality of 1D template analogue data structures includes producing a 1D template pixel sums data structure for each character template; - each 1D template pixel sums data structure indicating a plurality of combined template column counts of template column foreground pixels;
each combined template column count indicating a count of foreground pixels in groups of at least two consecutive columns of the character template;
producing the 1D image analogue data structure includes computing a plurality of combined image column counts of image column foreground pixels to produce a 1D image pixel sums data structure;
computing each combined image column count includingproducing a count of foreground pixels in at least every two adjacent column of the image portion;
determining a maximum count of foreground pixels for each pair of consecutive counts of foreground pixels; and
storing the maximum count of foreground pixels as a combined image column count in the 1D image pixel sums data structure;
the 1D image pixel sums data structure including a combined image column count for every other one of the columns of the image portion; and
computing each of the plurality of template-image heuristic scores includes computing a first template-image heuristic score using the 1D template pixel sums data structure indicating a first character template at a first column position in the image portion;
the computing step including determining, for each combined template column count in the 1D template pixel sums data structure, a minimum of the combined template column count of the template column foreground pixels and the combined image column count of the image column foreground pixels, and summing the minima to produce the template-image heuristic score; and
assigning the first template-image heuristic score as the template-image heuristic score for the first character template at a next adjacent column position in the image portion;
wherein computing the plurality of template-image heuristic scores includes computing one template-image heuristic score for each 1D template pixel sums data structure at every other column position in the image portion.
- each 1D template pixel sums data structure indicating a plurality of combined template column counts of template column foreground pixels;
-
7. The method of claim 1 for operating a processor-controlled machine to decode a text line image wherein
each character template includes at least two foreground template levels and a background template level; - each foreground template level including a plurality of foreground pixels and indicating a template level weighting factor associated with the foreground template level;
producing the plurality of 1D template analogue data structures includes producing a 1D template pixel sums data structure for each character template;
each of the plurality of 1D template analogue data structures indicating, for each of the foreground template levels of the character template, information indicating a foreground template level combined column pixel count weighted by the template level weighting factor associated therewith;
each foreground template level combined column pixel count indicating a count of foreground pixels in groups of at least two consecutive columns of the character template;
producing the 1D image analogue data structure includes producing a 1D image pixel sums data structure including a plurality of combined image column counts of image column foreground pixels;
each combined image column count indicating a count of foreground pixels in groups of at least two adjacent column of pixels in the image portion such that the 1D image pixel sums data structure includes a combined image column count for every column of pixels in the image portion; and
computing each one of the plurality of template-image heuristic scores includes for a plurality of entries in the 1D image pixel sums data structure, using the combined image column counts of image column foreground pixels to produce an image pixel score; and
summing the image pixel scores to produce the template-image heuristic score;
wherein computing the plurality of template-image heuristic scores includes computing one template-image heuristic score for each 1D template pixel sums data structure at each column position in the image portion.
- each foreground template level including a plurality of foreground pixels and indicating a template level weighting factor associated with the foreground template level;
-
8. The method of claim 1 for operating a processor-controlled machine to decode a text line image wherein performing the dynamic programming operation includes
determining an estimated best path through the decoding trellis using the plurality of template-image heuristic scores; - and
for each node on the estimated best path, computing an actual template-image matching score between the character template indicated by the character label and an image region indicated by the node;
re-scoring the node to indicate the actual template-image matching score in place of the template-image heuristic score; and
repeating the dynamic programming operation until all nodes on the estimated best path indicate actual template-image matching scores;
the dynamic programming operation using the nodes on the best path to produce the character labels of the character symbols represented by the image glyphs.
- and
-
9. An article of manufacture for use in a machine that includes a memory device for storing data;
- a storage medium access device for accessing a medium that stores data; and
a processor connected for accessing the data stored in the memory device and for receiving data from the storage medium access device;
the article comprising;a data storage medium that can be accessed by the storage medium access device when the article is used in the machine; and
data stored in the data storage medium so that the storage medium access device can provide the stored data to the processor when the article is used in the machine;
the stored data comprising instruction data indicating instructions the processor can execute;
the processor, in executing the instructions, receiving an input text line image indicating a bitmapped image region including a plurality of image glyphs each indicating a character symbol;
the processor, further in executing the instructions, obtaining a plurality of character templates and character labels stored in the memory device of the machine;
each character template indicating a two-dimensional bitmapped image of a character symbol;
each character template further indicating a character label identifying the character symbol represented by the character template; and
the processor, still further in executing the instructions, for each image, producing a one-dimensional (1D) image analogue data structure using pixel counts of image foreground pixels in columns of the image portion of the input text line image;
the processor, further in executing the instructions, for each image, producing a plurality of 1D template analogue data structures;
each 1D template analogue data structure using pixel counts of template foreground pixels in columns of a respective one of the plurality of character templates;
the processor, further in executing the instructions, computing a plurality of template-image heuristic scores using the 1D image analogue data structure and the plurality of 1D template analogue data structures;
each template-image heuristic score indicating an estimated measurement of a match between one of the plurality of character templates and a two-dimensional region of the image portion of the input text line image; and
the processor, still further in executing the instructions, performing a dynamic programming operation using a decoding trellis data structure indicating a stochastic finite state network including nodes and transitions between nodes indicating a model of expected spatial arrangements of character symbols in the input text line image;
the dynamic programming operation using the plurality of template-image heuristic scores to decode the input text line image and produce the character labels of the character symbols represented by the image glyphs included therein.
- a storage medium access device for accessing a medium that stores data; and
-
10. In an image recognition computer-implemented method for analyzing a bitmap text line image into a combination of character symbol templates selected from a library of templates on the basis of at least one complete path computed through a decoding trellis of a Markov source, the improvement comprising:
-
assigning a heuristic column-based template-image matching score to each transition in the decoding trellis according to a character symbol template associated with the transition;
the heuristic column-based template-image matching score for a selected character symbol template indicating an estimated measurement of a match between one of the character symbol templates and a two-dimensional region of the bitmap text line image and being a substitute for an actual score; and
performing a repeated sequence of a dynamic programming operation followed by a re-scoring operation until a stopping condition is met;
the dynamic programming operation producing a complete path of nodes and transitions through the decoding trellis using the heuristic column-based template-image matching scores assigned to the transitions;
the re-scoring operation assigning an actual score to each transition in the complete path having a heuristic column-based template-image matching score;
the decoding trellis having the actual scores available to subsequent iterations of the dynamic programming operation.
-
Specification