Iterative recognition-guided thresholding and data extraction
First Claim
1. A computer-implemented method, comprising:
- identifying a region of interest within a digital image;
generating a plurality of binarized images based on the region of interest, wherein some or all of the binarized images are generated using a different one of a plurality of binarization thresholds; and
extracting data from some or all of the plurality of binarized images;
wherein extracting the data from some or all of the plurality of binarized images comprises;
generating at least one sequence of candidate extraction results for each grouping of one or more connected components depicted within the region of interest;
determining an optimal extraction result within each sequence of candidate extraction results;
assembling all of the optimal extraction results into a single string of the one or more connected components; and
wherein determining the optimal extraction result within each sequence of candidate extraction results comprises selecting one extraction result within each sequence of candidate extraction results so as to minimize intensity differences between the optimal extraction results assembled into the single string; and
wherein at least some of the connected components are text characters.
5 Assignments
0 Petitions
Accused Products
Abstract
Techniques for improved binarization and extraction of information from digital image data are disclosed in accordance with various embodiments. The inventive concepts include independently binarizing portions of the image data on the basis of individual features, e.g. per connected component, and using multiple different binarization thresholds to obtain the best possible binarization result for each portion of the image data independently binarized. Determining the quality of each binarization result may be based on attempted recognition and/or extraction of information therefrom. Independently binarized portions may be assembled into a contiguous result. In one embodiment, a method includes: identifying a region of interest within a digital image; generating a plurality of binarized images based on the region of interest using different binarization thresholds; and extracting data from some or all of the plurality of binarized images. Corresponding systems and computer program products are also disclosed.
806 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
identifying a region of interest within a digital image; generating a plurality of binarized images based on the region of interest, wherein some or all of the binarized images are generated using a different one of a plurality of binarization thresholds; and extracting data from some or all of the plurality of binarized images; wherein extracting the data from some or all of the plurality of binarized images comprises; generating at least one sequence of candidate extraction results for each grouping of one or more connected components depicted within the region of interest; determining an optimal extraction result within each sequence of candidate extraction results; assembling all of the optimal extraction results into a single string of the one or more connected components; and wherein determining the optimal extraction result within each sequence of candidate extraction results comprises selecting one extraction result within each sequence of candidate extraction results so as to minimize intensity differences between the optimal extraction results assembled into the single string; and wherein at least some of the connected components are text characters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20)
-
-
15. A system, comprising:
- a processor; and
logic integrated with and/or executable by the processor to cause the processor to;identify a region of interest within a digital image, wherein the region of interest comprises a plurality of connected components; generate a plurality of binarized images based on the region of interest, wherein some or all of the binarized images are generated using a different one of a plurality of binarization thresholds; and extract data from some or all of the plurality of binarized images, wherein the data comprise a potential character identity of one or more of the plurality of connected components; wherein the region of interest is characterized by a complex background overlapped by the plurality of connected components; wherein one or more of the connected components overlap or are obscured by one or more unique background elements such that no single binarization threshold applied to a region encompassing the one or more of the plurality of connected components can identify the one or more of the connected components that overlap or are obscured by the one or more unique background elements.
- a processor; and
-
16. A computer program product, comprising a non-transitory computer readable medium having embodied therewith computer readable program instructions configured to cause a processor, upon execution thereof, to:
-
identify, using the processor, a region of interest within a digital image; generate, using the processor, a plurality of binarized images based on the region of interest, wherein some or all of the binarized images are generated using a different one of a plurality of binarization thresholds; and subjecting the region of interest within a digital image to a plurality of thresholding and extraction iterations; extract, using the processor, data from some or all of the plurality of binarized images; wherein the extracted data comprises one or more connected components represented in the plurality of binarized images; and wherein one or more of the connected components overlap or are obscured by one or more unique background elements such that no single binarization threshold applied to a region encompassing the one or more connected components can identify the one or more of the connected components that overlap or are obscured by the one or more unique background elements.
-
Specification