Automatic training of layout parameters in a 2D image model
First Claim
1. A method for operating a processor-controlled machine to determine an unknown value of a text image layout parameter used with a two-dimensional (2D) image model;
- the machine including a signal source for receiving data;
memory for storing data; and
a processor connected for accessing instruction data which is stored in the memory for operating the machine;
the processor being further connected for receiving data from the signal source; and
connected for storing data in the memory;
the method comprising;
obtaining a data structure indicating a 2D image model modeling as an image grammar an image layout structure common to a class of 2D text images;
the 2D image model including a production rule indicating that first and second image constituents occurring in a 2D text image consistent with the image layout structure produce a third image constituent occurring therein;
the production rule including a text image layout parameter that indicates the spatial relationship between the first and second image constituents;
a value of the text image layout parameter being unknown;
receiving a plurality of input two-dimensional (2D) text image data structures from the signal source;
each input 2D text image represented by the plurality of input 2D text image data structures having the image layout structure common to the class of 2D text images and including at least one occurrence of first and second image constituents;
for each respective input 2D text image, producing a data structure, using the 2D image model, indicating first and second image positions in the input 2D text image identifying respective locations of the first and second image constituents therein; and
obtaining document-specific measurement data about the first and second image constituents from the data structure; and
computing a value for the text image layout parameter using the document-specific measurement data obtained from the data structures for the respective input 2D text images;
the value computed for the text image layout parameter representing a class-specific value for all text images in the class of 2D input text images being modeled by the 2D image model.
9 Assignments
0 Petitions
Accused Products
Abstract
A two-dimensional (2D) image model models the layout structure of a class of document images as an image grammar and includes production rules having explicit layout parameters as data items that indicate information about the spatial relationships among image constituents occurring in images included in the class. The parameters are explicitly represented in the grammar rules in a manner that permits them to be automatically trained by a training operation that makes use of sample document images from the class of modeled documents. After each sample image is aligned with the 2D grammar, document-specific measurements about the spatial relationships between image constituents are taken from the image. Optimal values for the layout parameters are then computed from the measurement data collected from all samples. An illustrated implementation of the 2D image model takes the form of a stochastic context-free attribute grammar in which synthesized and inherited attributes and synthesis and inheritance functions are associated with each production rule in the grammar. The attributes indicate physical spatial locations of image constituents in the image, and a set of parameterized functions, in which the coefficients are the layout parameters, compute the attributes as a function of a characteristic of an image constituent of the production rule. The measurement data is taken from an annotated parse tree produced for each training image by the grammar. A trained grammar can then be used, for example, for document recognition and layout analysis operations on any document in the class of documents modeled by the grammar.
156 Citations
25 Claims
-
1. A method for operating a processor-controlled machine to determine an unknown value of a text image layout parameter used with a two-dimensional (2D) image model;
- the machine including a signal source for receiving data;
memory for storing data; and
a processor connected for accessing instruction data which is stored in the memory for operating the machine;
the processor being further connected for receiving data from the signal source; and
connected for storing data in the memory;
the method comprising;obtaining a data structure indicating a 2D image model modeling as an image grammar an image layout structure common to a class of 2D text images;
the 2D image model including a production rule indicating that first and second image constituents occurring in a 2D text image consistent with the image layout structure produce a third image constituent occurring therein;
the production rule including a text image layout parameter that indicates the spatial relationship between the first and second image constituents;
a value of the text image layout parameter being unknown;
receiving a plurality of input two-dimensional (2D) text image data structures from the signal source;
each input 2D text image represented by the plurality of input 2D text image data structures having the image layout structure common to the class of 2D text images and including at least one occurrence of first and second image constituents;
for each respective input 2D text image, producing a data structure, using the 2D image model, indicating first and second image positions in the input 2D text image identifying respective locations of the first and second image constituents therein; and
obtaining document-specific measurement data about the first and second image constituents from the data structure; and
computing a value for the text image layout parameter using the document-specific measurement data obtained from the data structures for the respective input 2D text images;
the value computed for the text image layout parameter representing a class-specific value for all text images in the class of 2D input text images being modeled by the 2D image model.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
the production rule specifies the spatial relationship between the first and second image constituents as a mathematical function of a characteristic of at least one of the first and second image constituents; - the mathematical function including the text image layout parameter as a parameter therein;
obtaining document-specific measurement data from the data structure includes determining a value for the mathematical function indicating the spatial relationship between the first and second image constituents in a respective input 2D text image; and
computing a value for the text image layout parameter using the document-specific measurement data includes computing the value using the values for the mathematical function measured from each respective input training image.
- the machine including a signal source for receiving data;
-
3. The method for determining an unknown value of a text image layout parameter according to claim 2 wherein the characteristic of the at least one of the first and second image constituents is a physical characteristic including at least one of a size, magnitude, dimension, proportion, and extent of the image constituent.
-
4. The method for determining an unknown value of a text image layout parameter according to claim 2 wherein
obtaining document-specific measurement data indicating the spatial relationship between the first and second image constituents identified therein includes obtaining data specified by the mathematical function to produce an equation having the text image layout parameter as an unknown parameter thereof; - obtaining document-specific measurement data producing at least one equation for each respective input 2D text image;
processing the plurality of input 2D text images producing a plurality of equations, each having the text image layout parameter as an unknown parameter thereof; and
computing a value of the text image layout parameter using the document-specific measurement data includes solving the plurality of equations to obtain the value.
- obtaining document-specific measurement data producing at least one equation for each respective input 2D text image;
-
5. The method for determining an unknown value of a text image layout parameter according to claim 2 wherein the mathematical function specified by the 2D image model having the text image layout parameter as a parameter thereof is a linear function of the text image layout parameter.
-
6. The method for determining an unknown value of a text image layout parameter according to claim 2 wherein the mathematical function having the text image layout parameter as a parameter thereof computes image coordinates of a bounding box of the third image constituent using the text image layout parameter and image coordinate data of bounding boxes of the first and second image constituents;
- the mathematical function including a plurality of text image layout parameters indicating a scaling and translation of the second image constituent relative to the first image constituent.
-
7. The method for determining an unknown value of a text image layout parameter according to claim 1 wherein producing the data structure indicating the image positions identifying respective locations of the first and second image constituents in the input 2D text image includes
prior to producing the data structure, constraining the 2D image model to produce a restricted 2D image model that models only the layout structure of the input 2D text image; - and
performing an alignment operation to align the restricted 2D image model with the input 2D text image.
- and
-
8. The method for determining an unknown value of a text image layout parameter according to claim 7 further including receiving a transcription data structure associated with each of the plurality of 2D input text images;
- and wherein constraining the 2D image model to produce a restricted 2D image model uses the transcription data structure to produce the restricted 2D image model.
-
9. The method for determining an unknown value of a text image layout parameter according to claim 1 wherein computing the value for the text image layout parameter using the document-specific measurement data includes solving an optimization problem that determines a value for the text image layout parameter indicating the optimal spatial relationship between the first and second image constituents for the document-specific measurement data obtained from all of the respective input 2D text images.
-
10. The method for determining an unknown value of a text image layout parameter according to claim 9 wherein solving the optimization problem includes computing a maximum likelihood estimate as the value for the text image layout parameter.
-
11. The method for determining an unknown value of a text image layout parameter according to claim 10 wherein computing a maximum likelihood estimate as the value for the text image layout parameter includes using a least squares technique to compute the value.
-
12. The method for determining an unknown value of a text image layout parameter according to claim 1 wherein
the 2D image model models the class of 2D text images as a stochastic context-free attribute grammar; - the production rule indicating a mathematical function representing the spatial relationship between the first and second image constituents as a function of a characteristic of at least one of the first and second image constituents;
the text image layout parameter being a parameter of the mathematical function; and
the data structure indicating the image positions identifying respective locations of the first and second image constituents in the input 2D text image is an annotated parse tree indicating the layout structure and message content of the input 2D text image.
- the production rule indicating a mathematical function representing the spatial relationship between the first and second image constituents as a function of a characteristic of at least one of the first and second image constituents;
-
13. The method for determining an unknown value of a text image layout parameter according to claim 1 wherein the 2D image model includes a plurality of text image layout parameters each indicating a spatial relationship between a respective pair of plural pairs of first and second image constituents;
- a value of at least a first one of the text image layout parameters being unknown; and
wherein the method further includesprior to receiving the plurality of input 2D text images, receiving a signal from the signal source indicating a selected text image layout parameter selected from the plurality of text image layout parameters; and
wherein computing a value for the text image layout parameter includes computing a value for the selected text image layout parameter using the document-specific measurement data obtained from the data structures for the respective input 2D text images.
- a value of at least a first one of the text image layout parameters being unknown; and
-
14. The method for determining an unknown value of a text image layout parameter according to claim 1 wherein the 2D image model that models the class of documents is a generative image grammar;
- the 2D generative image grammar being capable of synthesizing a 2D text image having a message content of an input message string arranged in the layout structure of the class of documents being modeled.
-
15. The method for determining an unknown value of a text image layout parameter according to claim 1 wherein the text image layout parameter indicates a scaling and translation of the second image constituent relative to the first constituent.
-
16. The method for determining an unknown value of a text image layout parameter according to claim 1 wherein the 2D image model that models the image layout structure common to a class of 2D text images represents each of the first and second image constituents as an image region defined by a bounding box;
- and wherein the production rule producing the third image constituent allows for the bounding boxes of the first and second image constituents to overlap.
-
17. A method for operating a processor-controlled machine to determine an unknown value of a text image layout parameter used with a two-dimensional (2D) image grammar;
- the machine including a signal source for receiving data;
memory for storing data; and
a processor connected for accessing instruction data which is stored in the memory for operating the machine;
the processor being further connected for receiving data from the signal source; and
connected for storing data in the memory;
the method comprising;obtaining a data structure indicating a 2D image grammar modeling a class of 2D text images as a stochastic context-free attribute grammar;
the 2D image grammar including a production rule indicating that first and second image constituents occurring in a 2D text image included in the class produce a third image constituent occurring therein;
the production rule indicating a mathematical function representing a spatial relationship between the first and second image constituents as a function of a characteristic of at least one of the first and second image constituents;
the mathematical function including a text image layout parameter as a parameter therein;
the text image layout parameter indicating the spatial relationship between the first and second image constituents;
a value of the text image layout parameter being unknown;
receiving a plurality of input two-dimensional (2D) text image data structures from the signal source;
each input 2D text image represented by the plurality of input 2D text image data structures having the image layout structure common to the class of 2D text images and including at least one occurrence of first and second image constituents;
for each respective input 2D text image, producing an annotated parse tree representation of a layout and content of a respective input 2D text image using the 2D image grammar; and
obtaining from the annotated parse tree document-specific measurement data about the spatial relationship between the first and second image constituents and a document-specific value of the mathematical function indicated by the production rule producing the third image constituent from the first and second image constituents occurring in the 2D text image; and
constructing an overall function representing the document-specific measurement data and document-specific function values obtained for all input 2D text images as a function of the text image layout parameter; and
solving the overall function for an optimal value of the text image layout parameter. - View Dependent Claims (18, 19, 20, 21, 22, 23)
prior to producing the annotated parse tree, constraining the 2D image grammar to produce a restricted 2D image grammar that models only the layout structure of the input 2D text image; - and
performing an alignment operation to align the restricted 2D image grammar with the input 2D text image.
- the machine including a signal source for receiving data;
-
23. The method for determining an unknown value of a text image layout parameter according to claim 17 wherein the stochastic context-free attribute grammar that models the class of documents is a generative image grammar;
- the generative image grammar being capable of synthesizing a 2D text image having a message content of an input message string arranged in the layout structure of the class of documents being modeled.
-
24. An article of manufacture for use in a machine that includes a memory device for storing data;
- a storage medium access device for accessing a medium that stores data; and
a processor connected for accessing the data stored in the memory device and for receiving data from the storage medium access device;
the article comprising;a data storage medium that can be accessed by the storage medium access device when the article is used in the machine; and
data stored in the data storage medium so that the storage medium access device can provide the stored data to the processor when the article is used in the machine;
the stored data comprising instruction data indicating instructions the processor can execute;
the processor, in executing the instructions, obtaining a data structure indicating a 2D image model modeling as an image grammar an image layout structure common to a class of 2D text images;
the 2D image model including a production rule indicating that first and second image constituents occurring in a 2D text image consistent with the image layout structure produce a third image constituent occurring therein;
the production rule including a text image layout parameter that indicates the spatial relationship between the first and second image constituents;
a value of the text image layout parameter being unknown;
the processor, further in executing the instructions, receiving a plurality of input two-dimensional (2D) text image data structures from the signal source;
each input 2D text image represented by the plurality of input 2D text image data structures having the image layout structure common to the class of 2D text images and including at least one occurrence of first and second image constituents;
the processor, still further in executing the instructions, for each respective input 2D text image, producing a data structure, using the 2D image model, indicating first and second image positions in the input 2D text image identifying respective locations of the first and second image constituents therein; and
obtaining document-specific measurement data about the first and second image constituents from the data structure;
the processor, still further in executing the instructions, computing a value for the text image layout parameter using the document-specific measurement data obtained from the data structures for the respective input 2D text images;
the value computed for the text image layout parameter representing a class-specific value for all text images in the class of 2D input text images being modeled by the 2D image model.- View Dependent Claims (25)
- a storage medium access device for accessing a medium that stores data; and
Specification