Computer-implemented method for automatic extraction of data from printed forms
First Claim
1. A machine-implemented method for extracting character data from printed forms of the type having generally straight lines and data regions separated by the lines, the method utilizing a processor coupled to an image scanning device and a storage means, the method comprising the steps of:
- (a) scanning a blank form so as to create a digital image of a first array of pixels representing the blank form;
(b) identifying line pixels in the first array which form generally straight lines of connected pixels;
(c) creating data masks in the first array at locations separated by the line pixels, the data masks corresponding to data regions in the blank form;
(d) scanning a filled-in form so as to create a digital image of a second array of pixels representing the filled-in form;
(e) identifying line pixels in the second array which form generally straight lines of connected line pixels;
(f) using line pixels in the second array and line pixels in the first array, calculating an offset of lines in the second array from lines in the first array;
(g) applying the data masks created in the first array to the second array by use of the calculated offset;
(h) determining if any connected character pixels extend across the perimeter of a data mask;
(i) in response to said determination, enlarging the data mask until substantially all connected character pixels are located within the enlarged data mask; and
(j) extracting data corresponding to character pixels from the data masks in the second array.
1 Assignment
0 Petitions
Accused Products
Abstract
A machine-implemented method for extracting character data from printed forms of the type having generally straight lines and data regions separated by lines utilizes a processor coupled to an image scanning device and a data storage device. The method includes the steps of scanning a blank form so as to create a digital image of a first array of pixels, identifying the pixels in the array which form generally straight lines of connected line pixels, creating data masks in the array at locations separated by the identified lines, the data masks corresponding to the data regions in the printed form, scanning a filled-in form so as to create a digital image of a second array of pixels, identifying pixels in the second array which form generally straight lines of connected pixels, calculating the offset of the lines in the second array from the lines in the first array, locating the data masks created in the first array into the second array by use of the calculated offset, and extracting data corresponding to character pixels from the data masks in the second array.
-
Citations
9 Claims
-
1. A machine-implemented method for extracting character data from printed forms of the type having generally straight lines and data regions separated by the lines, the method utilizing a processor coupled to an image scanning device and a storage means, the method comprising the steps of:
-
(a) scanning a blank form so as to create a digital image of a first array of pixels representing the blank form; (b) identifying line pixels in the first array which form generally straight lines of connected pixels; (c) creating data masks in the first array at locations separated by the line pixels, the data masks corresponding to data regions in the blank form; (d) scanning a filled-in form so as to create a digital image of a second array of pixels representing the filled-in form; (e) identifying line pixels in the second array which form generally straight lines of connected line pixels; (f) using line pixels in the second array and line pixels in the first array, calculating an offset of lines in the second array from lines in the first array; (g) applying the data masks created in the first array to the second array by use of the calculated offset; (h) determining if any connected character pixels extend across the perimeter of a data mask; (i) in response to said determination, enlarging the data mask until substantially all connected character pixels are located within the enlarged data mask; and (j) extracting data corresponding to character pixels from the data masks in the second array. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification