Document image binarization method
First Claim
Patent Images
1. A method implemented in a data processing system which includes a processor and a memory, for binarizing a multi-bit document image, comprising:
- (a) binarizing the document image a plurality of times, each time using one of a plurality of different binarization thresholds, to generate a plurality of corresponding binary images;
for each of the binary images,(b) applying connected component analysis to the binary image to identify connected components in the binary image;
(c) identifying all connected components in the binary image that are larger than a threshold size and have fill rates higher than a fill rate threshold and removing all connected components contained within bounding boxes of the identified connected components; and
(d) counting a first number of connected components in the binary image that have sizes equal to or larger than a first threshold size, and counting a second number of connected components in the binary image that have sizes equal to or smaller than a second threshold size;
(e) based on the first number and the second number of each binary image, selecting one of the binary images as the optimum binary image; and
(f) outputting the optimum binary image.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for binarization of document image using multi-threshold process to determine an optimum global binarization threshold for the image. The optimum binarization threshold is determined by binarizing the document multiple times using different threshold values, and calculating the statistics of the useful information and noise for each threshold value to select the optimum threshold value.
22 Citations
16 Claims
-
1. A method implemented in a data processing system which includes a processor and a memory, for binarizing a multi-bit document image, comprising:
-
(a) binarizing the document image a plurality of times, each time using one of a plurality of different binarization thresholds, to generate a plurality of corresponding binary images; for each of the binary images, (b) applying connected component analysis to the binary image to identify connected components in the binary image; (c) identifying all connected components in the binary image that are larger than a threshold size and have fill rates higher than a fill rate threshold and removing all connected components contained within bounding boxes of the identified connected components; and (d) counting a first number of connected components in the binary image that have sizes equal to or larger than a first threshold size, and counting a second number of connected components in the binary image that have sizes equal to or smaller than a second threshold size; (e) based on the first number and the second number of each binary image, selecting one of the binary images as the optimum binary image; and (f) outputting the optimum binary image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product comprising a computer usable non-transitory medium having a computer readable program code embedded therein for controlling a computer, the computer readable program code configured to cause the computer to execute a process for binarizing a multi-bit document image, the process comprising:
-
(a) binarizing the document image a plurality of times, each time using one of a plurality of different binarization thresholds, to generate a plurality of corresponding binary images; for each of the binary images, (b) applying connected component analysis to the binary image to identify connected components in the binary image; (c) identifying all connected components in the binary image that are larger than a threshold size and have fill rates higher than a fill rate threshold and removing all connected components contained within bounding boxes of the identified connected components; and (d) counting a first number of connected components in the binary image that have sizes equal to or larger than a first threshold size, and counting a second number of connected components in the binary image that have sizes equal to or smaller than a second threshold size; (e) based on the first number and the second number of each binary image, selecting one of the binary images as the optimum binary image; and (f) outputting the optimum binary image. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification