System and method for increasing the accuracy of optical character recognition (OCR)
First Claim
Patent Images
1. A method for increasing the accuracy of optical character recognition (OCR) for at least one item, comprising:
- obtaining OCR results of OCR scanning from at least one OCR module;
creating at least one OCR seed using at least a portion of the OCR results, the at least one OCR seed comprising a plurality of imagelets corresponding to each character identified in the at least a portion of the OCR results, wherein the at least one OCR seed is cleaned by selecting imagelets similar to one another for each character identified in the at least a portion of the OCR results;
creating at least one OCR learn set using at least a portion of the OCR seed;
comparing the at least one OCR learn set to each imagelet to create at least one mismatch distribution of the at least one OCR learn set compared to each imagelet, the at least one mismatch distribution comprising at least one confidence rating including a confidence score for the imagelet compared to at least one possible character; and
applying the OCR learn set and the at least one mismatch distribution to the at least one item to obtain additional OCR results such that only possible characters having a confidence score higher than a threshold are considered when applying the at least one mismatch distribution to obtain the additional OCR results.
11 Assignments
0 Petitions
Accused Products
Abstract
A system and/or method for increasing the accuracy of optical character recognition (OCR) for at least one item, comprising: obtaining OCR results of OCR scanning from at least one OCR module; creating at least one OCR seed using at least a portion of the OCR results; creating at least one OCR learn set using at least a portion of the OCR seed; and applying the OCR learn set to the at least one item to obtain additional optical character recognition (OCR) results.
114 Citations
32 Claims
-
1. A method for increasing the accuracy of optical character recognition (OCR) for at least one item, comprising:
-
obtaining OCR results of OCR scanning from at least one OCR module; creating at least one OCR seed using at least a portion of the OCR results, the at least one OCR seed comprising a plurality of imagelets corresponding to each character identified in the at least a portion of the OCR results, wherein the at least one OCR seed is cleaned by selecting imagelets similar to one another for each character identified in the at least a portion of the OCR results; creating at least one OCR learn set using at least a portion of the OCR seed; comparing the at least one OCR learn set to each imagelet to create at least one mismatch distribution of the at least one OCR learn set compared to each imagelet, the at least one mismatch distribution comprising at least one confidence rating including a confidence score for the imagelet compared to at least one possible character; and applying the OCR learn set and the at least one mismatch distribution to the at least one item to obtain additional OCR results such that only possible characters having a confidence score higher than a threshold are considered when applying the at least one mismatch distribution to obtain the additional OCR results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for increasing the accuracy of optical character recognition (OCR) for at least one item, comprising:
at least one processor, wherein the at least one processor is configured to perform; obtaining OCR results of OCR scanning from at least one OCR module; creating at least one OCR seed using at least a portion of the OCR results, the at least one OCR seed comprising a plurality of imagelets corresponding to each character identified in the at least a portion of the OCR results, wherein the at least one OCR seed is cleaned by selecting imagelets similar to one another for each character identified in the at least a portion of the OCR results; creating at least one OCR learn set using at least a portion of the OCR seed; comparing the at least one OCR learn set to each imagelet to create at least one mismatch distribution of the at least one OCR learn set compared to each imagelet, the at least one mismatch distribution comprising at least one confidence rating including a confidence score for the imagelet compared to at least one possible character; and applying the OCR learn set and the at least one mismatch distribution to the at least one item to obtain additional OCR results such that only possible characters having a confidence score higher than a threshold are considered when applying the at least one mismatch distribution to obtain the additional OCR results. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
Specification