Methodology for OCR error checking through text image regeneration
First Claim
1. A method of reducing errors in optical character recognition procedures comprising the steps of:
- (a) digitizing an image of an object, said image containing alpha-numeric characters;
(b) storing a bitmap of said digitized image as a scanned document file (SDF);
(c) performing an OCR step to obtain at least one candidate character;
(d) storing an indication of said at least one candidate character in a textural results file (TRF);
(e) determining the font of said digitized image;
(f) storing said determined font in a regeneration library file (RLF);
(g) generating a regenerated image file using said TRF and said RLF;
(h) comparing at least a portion of said regenerated image file with a corresponding portion of the bitmap of said digitized image stored in said scanned document file;
(i) outputting said TRF if the results of said comparison step indicate a match of at least said portion of said regenerated image file with said corresponding portion of said bitmap in said scanned document; and
(j) performing further processing to resolve the mismatch if a match is not found in step (i).
5 Assignments
0 Petitions
Accused Products
Abstract
A method of reducing errors in optical character recognition (OCR) procedures includes digitizing an image of an object, the image containing alpha-numeric characters, and storing a bitmap of the digitized image as a scanned document file (SDF). An OCR step is performed to obtain at least one candidate character, and an indication of the at least one candidate character is stored in a textural results file (TRF). The font of the digitized image is determined and the font is stored in a regeneration library file (RLF). A regenerated image file is generated using the TRF and the RLF. At least a portion of the regenerated image file is compared with a corresponding portion of the bitmap of the digitized image stored in the SDF. The TRF is outputted if the results of the comparison indicate a match of at least the portion of the regenerated image file with the corresponding portion of the bitmap. If a match is not found, further processing is performed to resolve the mismatch.
65 Citations
14 Claims
-
1. A method of reducing errors in optical character recognition procedures comprising the steps of:
-
(a) digitizing an image of an object, said image containing alpha-numeric characters; (b) storing a bitmap of said digitized image as a scanned document file (SDF); (c) performing an OCR step to obtain at least one candidate character; (d) storing an indication of said at least one candidate character in a textural results file (TRF); (e) determining the font of said digitized image; (f) storing said determined font in a regeneration library file (RLF); (g) generating a regenerated image file using said TRF and said RLF; (h) comparing at least a portion of said regenerated image file with a corresponding portion of the bitmap of said digitized image stored in said scanned document file; (i) outputting said TRF if the results of said comparison step indicate a match of at least said portion of said regenerated image file with said corresponding portion of said bitmap in said scanned document; and (j) performing further processing to resolve the mismatch if a match is not found in step (i). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification