Document image decoding using an integrated stochastic language model
First Claim
1. A method for operating a processor-controlled machine to perform a decoding operation to decode a text line image using a language model;
- the method comprising the steps of;
receiving an input text line image including a plurality of image glyphs each indicating a character symbol;
representing the input text line image as an image network data structure indicating a plurality of nodes and branches between nodes;
each node indicating a location of an image glyph;
each branch leading into a node being associated with a character symbol identifying the image glyph;
the plurality of nodes and branches indicating a plurality of possible paths through the image network;
each path indicating a possible transcription of the input text line image;
assigning a language model score computed from a language model to each branch in the image network according to the character symbol associated with the branch;
the language model score indicating a validity measurement for a character symbol sequence ending with the character symbol associated with the branch;
performing a repeated sequence of a best path search operation followed by a network expansion operation until a stopping condition is met;
the best path search operation producing a complete path of branches and nodes through the image network using the language model scores assigned to the branches;
the network expansion operation including adding at least one context node and context branch to the image network;
the context node having a character history associated therewith;
the context branch indicating an updated language model score for the character history ending with the character symbol associated with the context branch;
the image network with the context node and context branch added thereto being available to a subsequent execution of the best path search operation; and
when the stopping condition is met, producing the transcription of the character symbols represented by the image glyphs of the input text line image using the character symbols associated with the branches of the complete path.
7 Assignments
0 Petitions
Accused Products
Abstract
A text recognition system represents the decoded message of a document image as a path through an image network. A method for integrating a language model into the network selectively expands the network to accommodate the language model only for certain ones of the paths in the network, effectively managing the memory storage requirements and computational complexities of integrating the language model efficiently into the network. The language model generates probability distributions indicating the probability of a certain character occurring in a string, given one or more previous characters in the string. Selectively expanding the image network is achieved by initially using upper bounds on the language model probabilities on the branches of an unexpanded image network. A best path search operation is then performed to determine an estimated best path through the image network using these upper bound scores. After decoding, only the nodes on the estimated best path are expanded with new nodes and with branches incoming to the new nodes that accommodate new language model scores reflecting actual character histories in place of the upper bound scores. Decoding and selectively expanding the image network are repeated until the final output transcription of the text image has been produced.
139 Citations
20 Claims
-
1. A method for operating a processor-controlled machine to perform a decoding operation to decode a text line image using a language model;
- the method comprising the steps of;
receiving an input text line image including a plurality of image glyphs each indicating a character symbol;
representing the input text line image as an image network data structure indicating a plurality of nodes and branches between nodes;
each node indicating a location of an image glyph;
each branch leading into a node being associated with a character symbol identifying the image glyph;
the plurality of nodes and branches indicating a plurality of possible paths through the image network;
each path indicating a possible transcription of the input text line image;
assigning a language model score computed from a language model to each branch in the image network according to the character symbol associated with the branch;
the language model score indicating a validity measurement for a character symbol sequence ending with the character symbol associated with the branch;
performing a repeated sequence of a best path search operation followed by a network expansion operation until a stopping condition is met;
the best path search operation producing a complete path of branches and nodes through the image network using the language model scores assigned to the branches;
the network expansion operation including adding at least one context node and context branch to the image network;
the context node having a character history associated therewith;
the context branch indicating an updated language model score for the character history ending with the character symbol associated with the context branch;
the image network with the context node and context branch added thereto being available to a subsequent execution of the best path search operation; and
when the stopping condition is met, producing the transcription of the character symbols represented by the image glyphs of the input text line image using the character symbols associated with the branches of the complete path. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
the language model is a stochastic N-gram language model indicating a language model weight for a selected character symbol, v, given a sequence of N− - 1 character symbols preceding the selected character symbol; and
the language model score is an upper bound score computed from an upper bound function using the stochastic language model;
the upper bound function producing an upper bound on the language model score using all possible character symbol sequences of N−
1 character symbols preceding the selected character symbol represented in the image network.
- the method comprising the steps of;
-
7. The method of claim 6 for operating a processor-controlled machine wherein the upper bound function is defined as
-
B ) = max A P ( v k AB ) , where q is the upper bound score, B is the sequence of j preceding character symbols, 0≦
j≦
N−
1, and A ranges over all (N−
j−
1)-long sequences of characters.
-
-
8. The method of claim 1 for operating a processor-controlled machine to decode a text line image using a language model wherein
each node in the image network data structure has a node order determined by a history string length of the character history associated therewith; - and
the network expansion operation adds a context node for every node in the complete path having a node order less than a maximum order;
the context node having a node order one higher than the node order of the node from which the context node is created.
- and
-
9. The method of claim 8 for operating a processor-controlled machine wherein the stopping condition is met when each node in the complete path produced by the best path search operation is of maximum order.
-
10. The method of claim 1 for operating a processor-controlled machine to decode a text line image using a language model wherein the best path search operation is a dynamic programming procedure;
- the dynamic programming procedure using a set of likelihood functions combined with the language model scores to compute the most likely path into each node at each image point in the image network.
-
11. The method of claim 1 for operating a processor-controlled machine to decode a text line image using a language model wherein producing the complete path of nodes and branches includes computing maximum cumulative path scores at image positions in the image network using the language model scores for the character symbols assigned by the language model to the branches;
- the best path search operation maximizing the cumulative path score at each image position.
-
12. The method of claim 11 wherein
each node in the image network data structure has a node order determined by a history string length of the character history associated therewith; -
the network expansion operation adds a context node for every node in the complete path having a node order less than a maximum order;
the context node having a node order one higher than the node order of the node from which the context node is created;
the context node having a text line image location identical to the text line image position of the node from which the context node is created; and
computing maximum cumulative path scores by the best path search operation includes, at each image position in the text line image and for each possible character symbol and for each node and context node at each image position, computing a next image position for the character symbol in the text line image;
computing a cumulative path score for a path including an incoming branch to a highest order node at the next image position;
comparing the cumulative path score to a prior maximum cumulative path score for the highest order node at the next image position to determine an updated maximum cumulative path score for the next image position; and
storing the updated maximum cumulative path score with the highest order node at the next image position.
-
-
13. The method of claim 1 for operating a processor-controlled machine to decode a text line image using a language model wherein
the image network includes a plurality of character templates and character labels; - each character template indicating a bitmapped image of a character symbol and a character label identifying the character symbol represented by the character template; and
the best path search operation further includes computing a plurality of matching scores each indicating a measurement of a match between one of the plurality of character templates and a two-dimensional region of the input text line image; and
computing maximum cumulative path scores at image positions in the image network using the matching scores and the language model scores for the character symbols assigned by the language model to the branches;
the best path search operation using the maximum cumulative path scores to produce the complete path.
- each character template indicating a bitmapped image of a character symbol and a character label identifying the character symbol represented by the character template; and
-
14. The method of claim 1 for operating a processor-controlled machine to decode a text line image using a language model wherein the image network data structure is a stochastic finite state image network that models an expected spatial arrangement of character symbols in the input text line image as a regular grammar.
-
15. The method of claim 14 for operating a processor-controlled machine wherein stochastic finite state image network is a Markov source.
-
16. In an image recognition computer-implemented method for analyzing a bitmap text line image into a combination of character symbol templates selected from a library of templates on the basis of at least one complete path computed through a decoding graph of a Markov source, the improvement comprising:
-
assigning a language model score computed from a language model to each transition in the decoding graph according to a character symbol associated with the transition;
the language model score indicating a validity measurement for a character symbol sequence ending with the character symbol associated with the branch; and
performing a repeated sequence of a best path search operation followed by a network expansion operation until a stopping condition is met;
the best path search operation producing a complete path of nodes and transitions through the decoding graph using the language model scores assigned to the transitions;
the network expansion operation producing an expanded decoding graph including a context node for each node included in the complete path;
the network expansion operation assigning to a transition incoming to a context node an updated language model score, computed from the language model, for a sequence of character symbols ending with a character symbol associated with the incoming transition;
the expanded decoding graph being available to subsequent executions of the best path search operation.
-
-
17. An article of manufacture for use in a machine that includes a memory device for storing data;
- a storage medium access device for accessing a medium that stores data; and
a processor connected for accessing the data stored in the memory device and for receiving data from the storage medium access device;
the article comprising;a data storage medium that can be accessed by the storage medium access device when the article is used in the machine; and
data stored in the data storage medium so that the storage medium access device can provide the stored data to the processor when the article is used in the machine;
the stored data comprising instruction data indicating instructions the processor can execute;
the processor, in executing the instructions, receiving an input text line image including a plurality of image glyphs each indicating a character symbol;
the processor, further in executing the instructions, representing the input text line image as an image network data structure indicating a plurality of nodes and branches between nodes;
each node indicating a location of an image glyph;
each branch leading into a node being associated with a character symbol identifying the image glyph;
the plurality of nodes and branches indicating a plurality of possible paths through the image network;
each path indicating a possible transcription of the input text line image;
the processor, still further in executing the instructions, assigning a language model score computed from a language model to each branch in the image network according to the character symbol associated with the branch;
the language model score indicating a validity measurement for a character symbol sequence ending with the character symbol associated with the branch;
the processor, further in executing the instructions, performing a repeated sequence of a best path search operation followed by a network expansion operation until a stopping condition is met;
the best path search operation producing a complete path of nodes and branches between nodes through the image network using the language model scores for the character symbols assigned to the branches;
the network expansion operation including adding at least one context node and context branch to the image network;
the context node having a character history associated therewith;
the context branch indicating an updated language model score for the character history ending with the character symbol associated with the context branch;
the image network with the context node and context branch added thereto being available to a subsequent execution of the best path search operation; and
the processor, still further in executing the instructions, when the stopping condition is met, producing the transcription of the character symbols represented by the image glyphs of the input text line image using the character symbols associated with the branches of the complete path. - View Dependent Claims (18, 19, 20)
the processor, in executing the instruction for performing the network expansion operation, adds a context node for every node in the complete path having a node order less than a maximum order;
the context node having a node order one higher than the node order of the node from which the context node is created;
the context node having a text line image location identical to the text line image position of the node from which the context node is created; and
the processor, in executing the instructions for computing maximum cumulative path scores, at each image position in the text line image and for each possible character symbol and for each node and context node at each image position, computes a next image position for the character symbol in the text line image;
computes a cumulative path score for a path including an incoming branch to a highest order node at the next image position;
compares the cumulative path score to a prior maximum cumulative path score for the highest order node at the next image position to determine an updated maximum cumulative path score for the next image position; and
stores the updated maximum cumulative path score with the highest order node at the next image position.
- a storage medium access device for accessing a medium that stores data; and
Specification