OCR image preprocessing method for image enhancement of scanned documents
First Claim
1. A process for enhancing an image of a scanned document having boundaries, said image including a plurality of lines of textual matter, said plurality of lines potentially having at least one skew relative to the boundaries of said image, requiring registration, containing inverted text matter, undesired dots, and vertical and horizontal lines, said process comprisinga step for providing a primary scan line run-length coded image of said document;
- a step for determining said skew;
a step for reducing said skew;
a step for determining said registration;
a step for correcting said registration;
a step for detecting said inverted text matter;
a step for reversing said inverted text matter;
a step for detecting said undesired dots;
a step for deleting said undesired dots;
a step for detecting said vertical lines;
a step for eliminating said vertical lines;
a step for detecting said horizontal lines;
a step for eliminating said horizontal lines.
6 Assignments
0 Petitions
Accused Products
Abstract
The process for enhancing images of scanned documents identifies a variety of items in the scanned document which could make optical character recognition and other document image processing difficult or impossible. These items include identifying skew, registration, speck, lines intersecting printed matter, reverse printing, shaded printed matter. The process allows to register such items, take steps to correct skew and image registration, as well as reverse invert printing. Items to be removed are listed for deletion in one step. The image of the scanned document is stored and processed as a run-lngth coded image, except for a some operations in which parts of the run length coded image are converted to pixel image code for particular substeps of the process. The result of such a substep, e.g. a modified image, is then re-converted to run-length code.
-
Citations
9 Claims
-
1. A process for enhancing an image of a scanned document having boundaries, said image including a plurality of lines of textual matter, said plurality of lines potentially having at least one skew relative to the boundaries of said image, requiring registration, containing inverted text matter, undesired dots, and vertical and horizontal lines, said process comprising
a step for providing a primary scan line run-length coded image of said document; -
a step for determining said skew; a step for reducing said skew; a step for determining said registration; a step for correcting said registration; a step for detecting said inverted text matter; a step for reversing said inverted text matter; a step for detecting said undesired dots; a step for deleting said undesired dots; a step for detecting said vertical lines; a step for eliminating said vertical lines; a step for detecting said horizontal lines; a step for eliminating said horizontal lines. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A process for enhancing an image of a scanned document having boundaries, said image including a plurality of lines of textual matter, said plurality of lines potentially having at least one skew relative to the boundaries of said image, requiring registration, containing inverted text matter, undesired dots, and vertical and horizontal lines, said process comprising
a step for providing a primary scan line run-length coded image of said document; -
a step for determining said skew; a step for reducing said skew; a step for determining said registration; a step for correcting said registration; a step for detecting said inverted text matter; a step for reversing said inverted text matter; a step for detecting and mapping said undesired dots in an information removal map; a step for deleting said undesired dots contained in said information removal map; a step for detecting and mapping said vertical lines in said information removal map; a step for eliminating said vertical lines contained in said information removal map; a step for detecting and mapping said horizontal lines in said information removal map; a step for eliminating said horizontal lines contained in said information removal map. - View Dependent Claims (9)
-
Specification