Method for processing optical character recognition (OCR) output data, wherein the output data comprises double printed character images
First Claim
1. A method for resolving contradicting output data from an Optical Character Recognition (OCR) system, wherein the output data comprises at least one suspected double printed character image, the method comprises:
- a) searching through the output data identifying images of characters having an image quality above a predefined level, and using these character images as a set of single character template images for characters,b) providing a bounding box around the suspected double printed character image, and then doing a gliding single character image correlation between each respective single template image, one by one, and the suspected double printed character image, and recording the correlation values and corresponding displacement values of the respective character image bodies for each step of movement performed in the gliding single character correlation process,c) selecting single template images having a correlation values above a predefined threshold level to create a list of candidates of combined single character template images aligned relative to each other according to the their corresponding displacement values relative to the bounding box,d) correlating each respective candidate of combined single character template images with the suspected double printed character image, and selects the combined single character template image having the highest correlation value as an identification of each respective character image in the suspected double printed character image.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is related to a method of processing of output data from an Optical Character Recognition (OCR) system, wherein the output data comprises images of double printed characters. The method identifies the respective members of a suspected double printed character image by first providing a set of single character template images from images of characters identified in the text being processed by the OCR system, then combining the single character templates providing candidate models for the suspected double printed character image. Correlation between each respective candidate model and the suspected double printed character image provides an indication of which pair of modelled single template character images that most probable are the correct identification of the respective character images in the double printed character image.
9 Citations
5 Claims
-
1. A method for resolving contradicting output data from an Optical Character Recognition (OCR) system, wherein the output data comprises at least one suspected double printed character image, the method comprises:
-
a) searching through the output data identifying images of characters having an image quality above a predefined level, and using these character images as a set of single character template images for characters, b) providing a bounding box around the suspected double printed character image, and then doing a gliding single character image correlation between each respective single template image, one by one, and the suspected double printed character image, and recording the correlation values and corresponding displacement values of the respective character image bodies for each step of movement performed in the gliding single character correlation process, c) selecting single template images having a correlation values above a predefined threshold level to create a list of candidates of combined single character template images aligned relative to each other according to the their corresponding displacement values relative to the bounding box, d) correlating each respective candidate of combined single character template images with the suspected double printed character image, and selects the combined single character template image having the highest correlation value as an identification of each respective character image in the suspected double printed character image. - View Dependent Claims (2, 3, 4, 5)
-
Specification