Document image assessment system and method
First Claim
1. A method for assessing the condition of a document as represented by a document image produced by a high speed document scanner to determine whether the document image is suitable for further processing, the method comprising the steps of:
- establishing and adjusting criteria for assessing the document image;
selecting a plurality of document image attributes related to the geometrical integrity of the document image, the condition of the document image, and the condition of the text in the document image, that support the selected criteria;
selecting a plurality of threshold values corresponding to the selected attributes;
processing the document image to obtain values for the selected attributes; and
comparing the value of each of the obtained values for the selected attributes against the threshold value selected for the obtained attribute to determine a difference for each and then evaluating the differences using the predetermined criteria to provide evaluation results of the document image.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method in accordance with the present invention includes a scanning assembly and a storage device coupled to a programmed computer with a set of instructions for carrying out an assessment of a document image. The system and method operate by: processing the document image to obtain one or more attributes related to the geometrical integrity of the document image; selecting a threshold value from a database for each of the obtained attributes; and then comparing each of the obtained attributes against the threshold value selected for the obtained attribute to determine a difference for each and then evaluating one or more of the differences using predetermined criteria to provide evaluation results of the geometrical integrity of the document image. The system and method may also operate to: process the document image to obtain attributes related to line skew, average character confidence, expected contrast, and sharpness in the document image; select a threshold value from a database for each of the obtained attributes; and compare each of the obtained attributes against the threshold value selected for the obtained attribute to determine the difference for each and then evaluate one or more of the differences using predetermined criteria to provide evaluation results of the condition of the text of the document image and of the condition of the image with respect to a fixed reference.
93 Citations
13 Claims
-
1. A method for assessing the condition of a document as represented by a document image produced by a high speed document scanner to determine whether the document image is suitable for further processing, the method comprising the steps of:
-
establishing and adjusting criteria for assessing the document image;
selecting a plurality of document image attributes related to the geometrical integrity of the document image, the condition of the document image, and the condition of the text in the document image, that support the selected criteria;
selecting a plurality of threshold values corresponding to the selected attributes;
processing the document image to obtain values for the selected attributes; and
comparing the value of each of the obtained values for the selected attributes against the threshold value selected for the obtained attribute to determine a difference for each and then evaluating the differences using the predetermined criteria to provide evaluation results of the document image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
locating at least one edge of the document image;
placing three or more sample points along the edge;
locating the three or more sample points;
establishing a line between each set of two or more of the sample points, each of the lines having an angle with respect to a fixed reference;
determining the angle for each of the lines; and
determining the difference between each of set of two of angles to obtain values of attributes of tears in the document, folded corners in the document, and multiple overlapping documents in the document image.
-
-
5. The method as set forth in claim 1 further comprising the steps of:
-
locating each corner of the document image;
assigning a label to each corner;
establishing a line between each set of two adjacent labels, each of the lines having an angle with respect to a fixed reference;
determining the angle for each line; and
determining the difference between each set of two of angles to obtain a value for attributes of tears in the document, folded corners in the document, multiple overlapping documents in the document image.
-
-
6. The method as set forth in claim 1 further comprising the steps of:
-
detecting edges of the document image;
generating a bounding box around the document image;
determining coordinates of expected corners for the document image with respect to a fixed reference;
detecting coordinates of actual corners for the document image with respect to the fixed reference; and
determining the difference between the location of the expected corners and the actual corners to obtain a value for an attribute of tears in the document, folded corners in the document, multiple overlapping documents in the document image.
-
-
7. The method as set forth in claim 1 further comprising the steps of:
-
detecting the location of the document image;
selecting one or more portions of the document image;
comparing each of the portions of the document image for a particular type of expected field; and
determining the number of expected fields which were identified to obtain a value for the attribute of average character confidence.
-
-
8. The method as set forth in claim 1 wherein the step of processing the document image to obtain a value for the attribute of average character confidence comprises the steps of:
-
detecting the location of the document image;
locating and processing each character in the document image;
performing optical character recognition on each located character and obtaining an average character confidence for each character; and
averaging all of the average character confidences for each character obtained to obtain the value for the attribute of average character confidence for the document image.
-
-
9. The method as set forth in claim 1 wherein the step of processing the document image to obtain a value for the attribute of expected contrast comprises the steps of:
-
detecting the location of the document image; and
counting the number of black and white pixels in the document image to obtain the value for the attribute of expected contrast.
-
-
10. The method as set forth in claim 1 wherein the step of processing the document image to obtain the attribute of sharpness comprises the steps of:
-
detecting the location of the document image; and
determining the frequency of black to white pixels per line of the document image to obtain the value of the attributes of sharpness.
-
-
11. A system for assessing the condition of a document as represented by a document image produced by a high speed document scanner to determine whether the document image is suitable for further processing, the system comprising:
-
means for establishing and adjusting a criteria for assessing the document image;
means for selecting and storing in the data base a plurality of document image attributes related to the geometrical integrity of the document image, the condition of the document image, and the condition of the text in the document image, that support the selected criteria;
means for adjusting and storing in the data base a plurality of threshold values corresponding to the selected attributes;
means for processing the document image to obtain values for the selected attributes; and
means for comparing each of the obtained attributes against the threshold value selected for the obtained attribute to determine the difference for each and the evaluating the differences using the selected criteria to provide evaluation results of the document image. - View Dependent Claims (12, 13)
-
Specification