OCR image pre-processor for detecting and reducing skew of the image of textual matter of a scanned document
First Claim
1. A process for enhancing an image of a scanned document, said image consisting of run-length coded scan lines including white run-length values and black-run-length values, and containing a first plurality of lines of textual matter, said textual matter including characters with descenders, said process including a de-skew step for reducing skew of said textual matter, said skew including a vertical skew component and a horizontal skew component, said de-skew step comprisinga sub-step for determining a line skew in scan lines per line of textual matter for each one of a second plurality of said lines of textual matter of said image,a sub-step for defining a determined skew value and a skew direction using the line skew values so determined for said second plurality of lines of textual matter;
- a sub-step for reducing said vertical skew component of said skew using said determined skew value;
a sub-step for reducing said horizontal skew component of said skew using said determined skew value,said sub-step for reducing said vertical skew component includes sequentially dividing each of said run-length coded scan lines into a plurality of skewed line segments of substantially equal lengths,wherein said plurality of skewed line segments equals said determined skew value plus 1 and includes a reference skew segment, andwherein said sub-step for reducing said vertical skew moves each of said skewed line segments proportionally to the determined skew value and relative to its distance from said reference skew segment measured in skewed line segments.
6 Assignments
0 Petitions
Accused Products
Abstract
A method is provided for identifying, correcting, modifying and reporting imperfections and features in pixel images that prevent or hinder proper OCR (Optical Character Recognition) and other document imaging processes. One embodiment of this invention provides that run length compressed images can be analyzed and corrected directly for improved performance. Major steps included in this invention for the enhancement of images for OCR and document imaging are: The detection, correction and reporting of skew from text or graphical lines. The detection, correction, and reporting of varying image registration. The detection, conversion and reporting of inverse type. The detection, removal and reporting of dot shading and lines with protection of characters.
61 Citations
9 Claims
-
1. A process for enhancing an image of a scanned document, said image consisting of run-length coded scan lines including white run-length values and black-run-length values, and containing a first plurality of lines of textual matter, said textual matter including characters with descenders, said process including a de-skew step for reducing skew of said textual matter, said skew including a vertical skew component and a horizontal skew component, said de-skew step comprising
a sub-step for determining a line skew in scan lines per line of textual matter for each one of a second plurality of said lines of textual matter of said image, a sub-step for defining a determined skew value and a skew direction using the line skew values so determined for said second plurality of lines of textual matter; -
a sub-step for reducing said vertical skew component of said skew using said determined skew value; a sub-step for reducing said horizontal skew component of said skew using said determined skew value, said sub-step for reducing said vertical skew component includes sequentially dividing each of said run-length coded scan lines into a plurality of skewed line segments of substantially equal lengths, wherein said plurality of skewed line segments equals said determined skew value plus 1 and includes a reference skew segment, and wherein said sub-step for reducing said vertical skew moves each of said skewed line segments proportionally to the determined skew value and relative to its distance from said reference skew segment measured in skewed line segments. - View Dependent Claims (2, 3)
-
-
4. A scanned document image enhancement process for reducing skew of said document image provided in run-length representation having white run-length values and black run-length values, said document including a plurality of lines of characters including characters, said image having scan lines and pixel columns addresses, said process including the steps of
determining the existence of skew; -
determining a line skew value for each of selected lines of characters included in said plurality of lines of characters for reducing the skew of said image; and reducing the skew; said step for determining the line skew value includes for each of said selected lines of characters a step for eliminating the descenders from said characters in each of said selected lines of characters and blurring each of said selected lines of characters by eliminating white spaces under a predetermined width in and between said characters of said lines of characters to form new black run-length values and then eliminating black run-length values under a predetermined value, thereby providing a blurred copy of said selected lines of characters and generating for each of said selected lines of characters a sequence of black reference run-length segments at the bottom of said selected line of characters, a step for determining the begin and end of said sequence of black run-length segments thereby defining a reference line, said reference line having a begin and an end, and a step for determining a skew number of scan lines between the begin and end of said reference line, said step for determining the line skew value further including step for providing an average skew number of scan lines for said selected lines of characters by averaging said skew number of scan lines of said selected lines of characters, and wherein said line skew value is said average skew number; and said step of reducing the skew further including a step for subdividing said run length representation of said image into a multi-run-length segment representation of equal length segments, whereby the number of segments is equal to the line skew in scan lines plus one. - View Dependent Claims (5, 6)
-
-
7. A scanned document image enhancement process for reducing skew of said document image provided in run-length representation having white run-length values and black run-length values, said document including a plurality of lines of characters including characters, said image having scan lines and pixel columns addresses, said process including the steps of
determining the existence of skew; -
determining a line skew value for selected lines of characters included in said plurality of lines of characters for reducing the skew of said image; and
reducing the skew;said step for determining the line skew value includes for each of said selected lines of characters a step for eliminating the descenders from said characters of said selected lines and blurring said selected lines of characters by eliminating white spaces under a predetermined width in and between said characters of said selected lines of characters and by eliminating black run-length values under a predetermined value, thereby providing a blurred copy of said selected lines of characters and generating for each of said selected lines of characters a sequence of black reference run-length segments at the bottom of said selected lines of characters, a step for determining the begin and end of said sequence of black run-length segments thereby defining a reference line, said reference line having a begin and an end, and a step for determining a skew number of scan lines between the begin and the end of said reference line, said step for determining the line skew value further including a step for providing an average skew number of scan lines for said selected lines of characters by averaging said skew number of scan lines of said selected lines of characters, and wherein said line skew value is said average skew number, and a step for determining local skew values within said selected lines of characters and a step for detecting differences between adjacent local skew values inside each of said selected lines of characters. - View Dependent Claims (8, 9)
-
Specification