Redigitization System and Service
First Claim
1. A method comprising:
- rasterizing an electronic document to obtain a raster image of the electronic document;
performing one or more optical character recognition (OCR) tasks on the raster image of the electronic document, performing the OCR tasks including identifying digitization errors in the electronic document;
correcting errors discovered by the OCR tasks; and
creating a customized error corrected version of the electronic document.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method to error correct extant electronic documents is disclosed. An electronic document may be rasterized to obtain a pixel representation of the electronic document (e.g., raster image). One or more optical character recognition (OCR) tasks may be performed on the raster image of the electronic document. Errors discovered by the OCR tasks may be corrected and a customized error corrected version of the electronic document may be created and stored. If the author of the electronic document is known, the raster image may be compared to a personalized tf*idf error dictionary associated with the author to determine known OCR errors specific to the author. The raster image may also be compared to a personalized electronic error dictionary associated with the author to determine known typographical errors specific to the author.
20 Citations
15 Claims
-
1. A method comprising:
-
rasterizing an electronic document to obtain a raster image of the electronic document; performing one or more optical character recognition (OCR) tasks on the raster image of the electronic document, performing the OCR tasks including identifying digitization errors in the electronic document; correcting errors discovered by the OCR tasks; and creating a customized error corrected version of the electronic document. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus comprising:
-
a processor circuit; a processing engine to access stored electronic documents; an optical character recognition (OCR) engine under control of the processor; a rasterizer to create raster images of the electronic documents; an OCR error dictionary stored in a memory; a personalized tf*idf dictionary stored in the memory; and a personalized electronic error dictionary stored in the memory, the OCR engine to perform OCR tasks on the raster images using the OCR error dictionary, the personalized tf*idf dictionary, and the personalized electronic error dictionary, the OCR tasks to; determine errors in the raster images; correct the errors in the raster images; create a customized error-corrected version of the electronic document; and store the customized error-corrected version of the electronic document in the memory.
-
-
9. An article of manufacture comprising a non-transitory computer-readable storage medium containing instructions that if executed enable a system to:
-
access an electronic document; obtain a pixel representation of the electronic document; perform one or more optical character recognition (OCR) tasks on the pixel representation of the electronic document; correct errors discovered by the OCR tasks; and create a customized error corrected version of the electronic document. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification