Document comparison and analysis for improved OCR
First Claim
1. A computerized method comprising:
- a. obtaining a first digital image of a first text;
b. applying a character recognition algorithm to said first digital image to obtain a text representation of said first digital image;
c. transforming said text representation into a second digital image;
d. comparing said first and second digital images to obtain a difference image; and
e. analyzing said difference image.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for enhancing the accuracy of Optical Character Recognition (OCR) algorithms by detection of differences between a digital image of a document and a text file corresponding to the digital image, created by the OCR algorithm.
The method includes calculating the transformation between the first and second digital images such as geometrical distortion, local brightness and contrast differences and blurring due to the optical imaging process. The method estimates the parameters of these transformations so that the transformations can be applied to at least one of the images, rendering it as similar as possible to the other image. The method further compares the two images in order to find differences. The method further displays the differences on a display device and analyzes the differences. The analysis results are fed back to the OCR algorithm.
33 Citations
70 Claims
-
1. A computerized method comprising:
-
a. obtaining a first digital image of a first text; b. applying a character recognition algorithm to said first digital image to obtain a text representation of said first digital image; c. transforming said text representation into a second digital image; d. comparing said first and second digital images to obtain a difference image; and e. analyzing said difference image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A system comprising:
-
a processor; communication means connected with the processor; display means; means for obtaining a first digital image of a first text; and memory means coupled to the processor, said memory means configured to store a plurality of modules for execution by the processor, the plurality of modules comprising; logic for performing a character recognition algorithm on said first digital image to obtain a text representation of said first digital image; logic for transforming said text representation into a second digital image; logic for comparing said first and second digital images to obtain a difference image; and logic for analyzing said difference image. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70)
-
Specification