Method and means for enhancing optical character recognition of printed documents
First Claim
1. A method for permitting machine-represented characters of a machine-represented document to be more accurately recovered when scanning a printed version of said document, comprising the steps of:
- assigning a first binary value to each of a plurality of original machine-represented characters in said document;
printing said first binary values in machine-readable symbology and human recognizable characters corresponding to said original machine-represented characters in a printed version of said document;
scanning said printed version to recover machine-represented characters corresponding to said printed human-recognizable characters and to recover said first binary values;
assigning a second binary value to each of said recovered machine-represented characters; and
comparing said recovered first binary values to said second binary values to identify errors in said recovered machine-represented characters.
2 Assignments
0 Petitions
Accused Products
Abstract
A document marker, including first values dependent upon the layout and the contents of the document and assigned by generating or preprocessing software, is provided in machine-readable symbology on the face of a printed version of the document. The marker may include encoded document layout information and values assigned on sequences of the original text, including text-dependent decimation sequences, error correction codes or check-sums. Upon optical character recognition scanning, or other digitizing reproduction, the marker is also scanned. The scanning computer, having corresponding software, assigns second values dependent upon the layout and contents of the reproduced document. Upon comparison of the first and second decimation sequences, line and character errors can be detected and some errors corrected, thereby generating re-aligned candidate sequences. Optional error correction codes can provide further correcting capabilities, as applied to the re-aligned reproduced document sequences; and, an optional check-sum comparison can be utilized to verify the accuracy of the reproduced sequences are correct.
-
Citations
14 Claims
-
1. A method for permitting machine-represented characters of a machine-represented document to be more accurately recovered when scanning a printed version of said document, comprising the steps of:
-
assigning a first binary value to each of a plurality of original machine-represented characters in said document; printing said first binary values in machine-readable symbology and human recognizable characters corresponding to said original machine-represented characters in a printed version of said document; scanning said printed version to recover machine-represented characters corresponding to said printed human-recognizable characters and to recover said first binary values; assigning a second binary value to each of said recovered machine-represented characters; and comparing said recovered first binary values to said second binary values to identify errors in said recovered machine-represented characters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for producing more accurate optical character recognition reproductions of documents having lines of original printed text and at least one document marker including a plurality of first decimation sequences, each corresponding to a sequence of said printed text, and at least one first check-sum value calculated on said sequence of printed text, comprising the steps of:
-
creating an electronic document comprising a plurality of first reproduced text sequences by optically scanning said original printed text; optically scanning said at least one document marker; decoding said plurality of first decimation sequences from said scanned document marker; decimating said reproduced text into a plurality of second decimation sequences; calculating the edit distances between corresponding lines of original printed text and reproduced printed text by comparing said first and said second decimation sequences; comparing the edit distances and identifying line insertion and deletion errors in said reproduced text when said edit distances differ by more than a predetermined amount; correcting detected line insertion and deletion errors; comparing each of said corresponding plurality of first and second decimation sequences; identifying text errors in said reproduced text at the sequence location at which said decimation sequences differ; substituting different characters in said sequence locations at which text errors have been identified to produce at least one second reproduced text sequence; calculating a second check-sum for each of said at least one second reproduced text sequences; comparing said second check-sum to said first check-sum; and verifying the accuracy of said second reproduced text sequence when said first and said second check-sums are equal. - View Dependent Claims (14)
-
Specification