Method and apparatus for recognizing a digitized form, extracting information from a filled-in form, and generating a corrected filled-in form
First Claim
1. A method of processing a filled-in form and a blank form to generate a virtual filled-in form, the filled-in form including a first plurality of pixel values, each of the first plurality of pixel values corresponding to one location in a first array of row and column pixel locations, the blank form including a second plurality of pixel values, each of the second plurality of pixel values corresponding to one location in a second array of row and column pixel locations, the method comprising:
- processing the filled-in form to identify column borders defining a region in which a first predetermined majority of foreground pixel values are located;
processing the filled-in form to identify row borders defining a region in which a second predetermined majority of foreground pixel values are located, wherein the identified column and row borders define a rectangular filled-in main image region;
determining a main image aspect ratio as a function of the determined row and column borders;
comparing the determined main image aspect ratio of the filled-in form to a main aspect ratio of said blank form to determine if the determined main image aspect ratio is within a predetermined difference threshold of the main aspect ration of said blank form; and
if it is determined that the aspect ratios are within the predetermined difference threshold, determining a scaling factor which can be used on one of the blank and filled-in forms to resize the main image area of one of the blank and filled-in forms so that the size of the main image area of the blank form and main image area of the filled-in form will match.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus for comparing blank forms represented in a digital format to digitized filled-in forms are described. Different errors are attributed different weights when attempting to correlate regions of blank and filled-in forms. Foreground pixels in the blank form which are not found in a corresponding portion of a filled-in form are attributed greater error significance than foreground pixels, e.g., pixels which may correspond to added text, found in the filled-in form which correspond to a background pixel value in the blank form. A virtual filled-in form including content, e.g., pixel values, from the filled-in form is generated from the content of the filled-in form and pixel value location mapping information determined from comparing the blank and filled-in forms. Various analysis is performed on a block basis, but in some embodiments the final pixel mapping to the virtual form is performed on a pixel by pixel rather than a block basis.
90 Citations
25 Claims
-
1. A method of processing a filled-in form and a blank form to generate a virtual filled-in form, the filled-in form including a first plurality of pixel values, each of the first plurality of pixel values corresponding to one location in a first array of row and column pixel locations, the blank form including a second plurality of pixel values, each of the second plurality of pixel values corresponding to one location in a second array of row and column pixel locations, the method comprising:
-
processing the filled-in form to identify column borders defining a region in which a first predetermined majority of foreground pixel values are located; processing the filled-in form to identify row borders defining a region in which a second predetermined majority of foreground pixel values are located, wherein the identified column and row borders define a rectangular filled-in main image region; determining a main image aspect ratio as a function of the determined row and column borders; comparing the determined main image aspect ratio of the filled-in form to a main aspect ratio of said blank form to determine if the determined main image aspect ratio is within a predetermined difference threshold of the main aspect ration of said blank form; and if it is determined that the aspect ratios are within the predetermined difference threshold, determining a scaling factor which can be used on one of the blank and filled-in forms to resize the main image area of one of the blank and filled-in forms so that the size of the main image area of the blank form and main image area of the filled-in form will match. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system for processing a filled-in form and a blank form to generate a virtual filled-in form, the filled-in form including a first plurality of pixel values, each of the first plurality of pixel values corresponding to one location in a first array of row and column pixel locations, the blank form including a second plurality of pixel values, each of the second plurality of pixel values corresponding to one location in a second array of row and column pixel locations, the system comprising:
-
means for processing the filled-in form to identify column borders defining a region in which a first predetermined majority of foreground pixel values are located; means for processing the filled-in form to identify row borders defining a region in which a second predetermined majority of foreground pixel values are located, wherein the identified column and row borders define a rectangular filled-in main image region; means for determining a main image aspect ratio as a function of the determined row and column borders; means for comparing the determined main image aspect ratio of the filled-in form to a main aspect ratio of said blank form to determine if the determined main image aspect ratio is within a predetermined difference threshold of the main aspect ration of said blank form; and means for determining a scaling factor which can be used on one of the blank and filled-in forms to resize the main image area of one of the blank and filled-in forms so that the size of the main image area of the blank form and main image area of the filled-in form will match if it is determined that the aspect ratios are within the predetermined difference threshold. - View Dependent Claims (17)
-
-
18. A method of determining if a blank form matches a filled-in form, the blank and filled-in form having a plurality of corresponding regions, the method comprising:
-
comparing corresponding regions of the blank form and filled-in form to detect a plurality of different types of errors including; first type errors in which one or more foreground pixel values in the blank form which are not present within a predetermined distance of one or more foreground pixel values in a corresponding region of a filled-in form; second type errors in which one or more foreground pixel values in the filled-in form are not present within a predetermined distance of a foreground pixel value in a corresponding region of the blank form; and generating a form mismatch estimate value indicative of the mismatch between the blank form and the filled-in form as a function of detected first type errors and second type errors. - View Dependent Claims (19, 20, 21)
-
-
22. A method of processing a filled-in form and a blank form to generate a virtual filled-in form, the filled-in form including a first plurality of pixel values, each of the first plurality of pixel values corresponding to one location in a first array of row and column pixel locations, the blank form including a second plurality of pixel values, each of the second plurality of pixel values corresponding to one location in a second array of row and column pixel locations, wherein said blank form pixel values are grouped into rectangular blocks, each of the blank form and filled-in forms including main image areas in which a majority of the foreground pixels in the form are located, the method comprising:
-
calculating a global offset indicating a displacement between the main image area of said blank form and the main image area of said filled-in form, said global offset indicating a displacement in terms of row and column offsets which, when applied to one of the images, can be used to make the two main image areas correspond to the same row and column locations after one of the images has been scaled so that the two main image areas are the same size; and for each block of the blank form, generating a block position offset indicating a block position offset which tends to minimize a total block error corresponding to the individual block, the block error being indicative of a location error that exists between the position of the block in the blank form and a set of pixel values in the filled-in form determined to correspond to said particular block based on at least a similarity of pixel values in said block and said set of pixel values in the filled-in form. - View Dependent Claims (23, 24)
-
-
25. A machine readable medium comprising machine executable instructions which, when executed cause a machine to perform processing of a filled-in form and a blank form to generate a virtual filled-in form, the filled-in form including a first plurality of pixel values, each of the first plurality of pixel values corresponding to one location in a first array of row and column pixel locations, the blank form including a second plurality of pixel values, each of the second plurality of pixel values corresponding to one location in a second array of row and column pixel locations, the processing including the steps of:
-
processing the filled-in form to identify column borders defining a region in which a first predetermined majority of foreground pixel values are located; processing the filled-in form to identify row borders defining a region in which a second predetermined majority of foreground pixel values are located, wherein the identified column and row borders define a rectangular filled-in main image region; determining a main image aspect ratio as a function of the determined row and column borders; comparing the determined main image aspect ratio of the filled-in form to a main aspect ratio of said blank form to determine if the determined main image aspect ratio is within a predetermined difference threshold of the main aspect ration of said blank form; and if it is determined that the aspect ratios are within the predetermined difference threshold, determining a scaling factor which can be used on one of the blank and filled-in forms to resize the main image area of one of the blank and filled-in forms so that the size of the main image area of the blank form and main image area of the filled-in form will match.
-
Specification