METHOD FOR BINARIZING SCANNED DOCUMENT IMAGES CONTAINING GRAY OR LIGHT COLORED TEXT PRINTED WITH HALFTONE PATTERN
First Claim
1. A method implemented in a data processing apparatus for binarizing a gray-scale document image which has been generated by scanning a paper-based document, the method comprising:
- (a) identifying text characters in the gray-scale document image;
(b) classifying each text character identified in step (a) as either a halftone text character or a non-halftone text character based on a topological analysis of the text character; and
(c) binarizing halftone text characters using pixel value characteristics obtained from only halftone text characters classified in step (b).
2 Assignments
0 Petitions
Accused Products
Abstract
A method for binarizing a scanned document images containing gray or light colored text printed with halftone patterns. The document image is initially binarized and connected image components are extracted from the initial binary image as text characters. Each text character is classified as either a halftone text character or a non-halftone text character based on an analysis of its topology features. The topology features may be the Euler number of the text character; a text character with a Euler number below −2 is classified as halftone text. The gray-scale document image is then divided into halftone text regions containing only halftone text characters and non-halftone text regions. Each region is binarized using its own pixel value statistics. This eliminates the influence of black text on the threshold values for binarizing halftone text. The binary maps of the regions are combined to generate the final binary map.
27 Citations
27 Claims
-
1. A method implemented in a data processing apparatus for binarizing a gray-scale document image which has been generated by scanning a paper-based document, the method comprising:
-
(a) identifying text characters in the gray-scale document image; (b) classifying each text character identified in step (a) as either a halftone text character or a non-halftone text character based on a topological analysis of the text character; and (c) binarizing halftone text characters using pixel value characteristics obtained from only halftone text characters classified in step (b). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product comprising a computer usable non-transitory medium having a computer readable program code embedded therein for controlling a data processing apparatus, the computer readable program code configured to cause the data processing apparatus to execute a process for binarizing a gray-scale document image which has been generated by scanning a paper-based document, the process comprising:
-
(a) identifying text characters in the gray-scale document image; (b) classifying each text character identified in step (a) as either a halftone text character or a non-halftone text character based on a topological analysis of the text character; and (c) binarizing halftone text characters using pixel value characteristics obtained from only halftone text characters classified in step (b). - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A scanner comprising:
-
a scanning section for scanning a hard copy document to generate a gray-scale document image; and
a data processing apparatus for processing the gray-scale document image to generate a binary map of the gray-scale document image,wherein the processing of the gray-scale document image includes;
(a) identifying text characters in the gray-scale document image, (b) classifying each text character identified in step (a) as either a halftone text character or a non-halftone text character based on a topological analysis of the text character, and (c) binarizing halftone text characters using pixel value characteristics obtained from only halftone text characters classified in step (b). - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
Specification