Range and/or polarity-based thresholding for improved data extraction
First Claim
1. A computer program product comprising a non-transitory computer readable storage medium having embodied thereon computer readable program instructions configured to cause a processor, upon execution of the computer readable program instructions, to perform operations comprising:
- rendering, using the processor, a digital image using a plurality of binarization thresholds to generate a plurality of range-binarized digital images, wherein each rendering of the digital image is generated using a different combination of the plurality of binarization thresholds, and wherein each combination of the plurality of thresholds includes a unique upper threshold and a unique lower threshold;
identifying, using the processor, one or more range connected components within the plurality of range-binarized digital images; and
identifying, using the processor, a plurality of text regions within the digital image based on some or all of the range connected components.
6 Assignments
0 Petitions
Accused Products
Abstract
Computerized techniques for improved binarization and extraction of information from digital image data are disclosed in accordance with various embodiments. The inventive concepts include: rendering, using a processor of the mobile device, a digital image using a plurality of binarization thresholds to generate a plurality of range-binarized digital images, wherein each rendering of the digital image is generated using a different combination of the plurality of binarization thresholds; identifying, using the processor of the mobile device, one or more range connected components within the plurality of range-binarized digital images; and identifying, using the processor of the mobile device, a plurality of text regions within the digital image based on some or all of the range connected components. Corresponding systems and computer program products are also disclosed.
-
Citations
20 Claims
-
1. A computer program product comprising a non-transitory computer readable storage medium having embodied thereon computer readable program instructions configured to cause a processor, upon execution of the computer readable program instructions, to perform operations comprising:
-
rendering, using the processor, a digital image using a plurality of binarization thresholds to generate a plurality of range-binarized digital images, wherein each rendering of the digital image is generated using a different combination of the plurality of binarization thresholds, and wherein each combination of the plurality of thresholds includes a unique upper threshold and a unique lower threshold; identifying, using the processor, one or more range connected components within the plurality of range-binarized digital images; and identifying, using the processor, a plurality of text regions within the digital image based on some or all of the range connected components. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 19)
-
-
10. A computer-implemented method, comprising:
-
rendering, using a processor, a digital image using a plurality of binarization thresholds to generate a plurality of range-binarized digital images, wherein each rendering of the digital image is generated using a different combination of the plurality of binarization thresholds; identifying, using the processor, one or more range connected components within the plurality of range-binarized digital images; identifying, using the processor, a plurality of text regions within the digital image based on some or all of the range connected components; and wherein at least one of the range connected components identified within the plurality of range-binarized digital images corresponds to a text character represented on a dual background within the digital image, at least some of the dual background being lighter than the text character, and at least some of the dual background being darker than the text character. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
20. A computer-implemented method, comprising:
-
rendering, using a processor, a digital image using a plurality of binarization thresholds to generate a plurality of range-binarized digital images, wherein each rendering of the digital image is generated using a different combination of the plurality of binarization thresholds; identifying, using the processor, one or more range connected components within the plurality of range-binarized digital images; identifying, using the processor, a plurality of text regions within the digital image based on some or all of the range connected components, wherein identifying the plurality of text regions within the digital image comprises; calculating one or more geometric characteristics of the one or more range connected components identified within the plurality of range-binarized digital images; grouping adjacent of the one or more connected components that exhibit one or more common geometric characteristics; defining each grouping of three or more adjacent range connected components that exhibit the one or more common geometric characteristics as a candidate text region; and assembling the candidate text regions by removing overlaps to identify the plurality of text regions within the digital image.
-
Specification