Image quality assessment and improvement for performing optical character recognition
First Claim
1. A computer-implemented method for identifying information in an electronic document, comprising:
- obtaining a reference image of the electronic document;
distorting the reference image by adjusting parameter values for a plurality of sets of parameters associated with a quality of the reference image to generate a plurality of distorted images;
for each distorted image;
analyzing the distorted image to attempt to detect a first set of parameters from the plurality of sets of parameters and corresponding parameter values used to generate the distorted image;
determining an accuracy of detection of the first set of parameters and the corresponding parameter values used to generate the distorted image, the determining including;
comparing each detected parameter determined as a result of the analyzing the distorted image with the first set of parameters used for generating the distorted image, anddetermining the accuracy of the detection based on the comparison; and
training a model based at least on the plurality of distorted images and respective accuracies of the detection to generate a trained model;
obtaining a second image of the electronic document;
determining, based on the trained model, a second set of parameters to be adjusted in the second image and a value corresponding to each parameter in the second set by which the parameter is to be adjusted;
determining, based on the trained model, at least one technique for adjusting each parameter in the second set of parameters in the second image to prepare the second image for optical character recognition (OCR);
preparing the second image for the OCR by adjusting each determined parameter in the second set of parameters by a corresponding determined value based on a corresponding determined technique for the determined parameter to generate a prepared second image; and
performing OCR on the prepared second image.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are disclosed for performing optical character recognition (OCR) by assessing and improving quality of electronic documents to perform the OCR. For example a method for identifying information in an electronic document includes obtaining a reference image of the electronic document, distorting the reference image by adjusting different sets of one or more parameters associated with a quality of the reference image to generate a plurality of distorted images, analyzing each distorted image to detect the adjusted set of parameters and corresponding adjusted values, determining an accuracy of detection of the set of parameters and the adjusted values, and training a model based at least on the plurality of distorted images and the accuracy of the detection, wherein the trained model determines at least a first technique for adjusting a set of parameters in a second image to prepare the second image for optical character recognition.
-
Citations
20 Claims
-
1. A computer-implemented method for identifying information in an electronic document, comprising:
-
obtaining a reference image of the electronic document; distorting the reference image by adjusting parameter values for a plurality of sets of parameters associated with a quality of the reference image to generate a plurality of distorted images; for each distorted image; analyzing the distorted image to attempt to detect a first set of parameters from the plurality of sets of parameters and corresponding parameter values used to generate the distorted image; determining an accuracy of detection of the first set of parameters and the corresponding parameter values used to generate the distorted image, the determining including; comparing each detected parameter determined as a result of the analyzing the distorted image with the first set of parameters used for generating the distorted image, and determining the accuracy of the detection based on the comparison; and training a model based at least on the plurality of distorted images and respective accuracies of the detection to generate a trained model; obtaining a second image of the electronic document; determining, based on the trained model, a second set of parameters to be adjusted in the second image and a value corresponding to each parameter in the second set by which the parameter is to be adjusted; determining, based on the trained model, at least one technique for adjusting each parameter in the second set of parameters in the second image to prepare the second image for optical character recognition (OCR); preparing the second image for the OCR by adjusting each determined parameter in the second set of parameters by a corresponding determined value based on a corresponding determined technique for the determined parameter to generate a prepared second image; and performing OCR on the prepared second image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for identifying information in an electronic document, comprising:
-
at least one processor configured to; obtain a reference image of the electronic document; distort the reference image by adjusting parameter values for a plurality of sets of parameters associated with a quality of the reference image to generate a plurality of distorted images; for each distorted image; analyze the distorted image to attempt to detect a first set of parameters from the plurality of sets or parameters and corresponding parameter values used to generate the distorted image; determine an accuracy of detection of the first set of parameters and the corresponding parameter values used to generate the distorted image, wherein the at least one processor determines the accuracy of detection by; comparing each detected parameter determined as a result of the analyzing the distorted image with the first set of parameters used for generating the distorted image; and determining the accuracy of the detection based on the comparison; and train a model based at least on the plurality of distorted images and respective accuracies of the detection to generate a trained model; obtain a second image of the electronic document; determine, based on the trained model, a second set of parameters to be adjusted in the second image and a value corresponding to each parameter in the second set by which the parameter is to be adjusted; determine, based on the trained model, at least one technique for adjusting each parameter in the second set of parameters in the second image to prepare the second image for optical character recognition (OCR) prepare the second image for the OCR by adjusting each determined parameter in the second set of parameters by a corresponding determined value based on a corresponding technique for the determined parameter to generate a prepared second image; and perform OCR on the prepared second image; and a memory coupled to the at least one processor. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification