Verification of optical character recognition results
First Claim
1. A method comprising:
- performing optical character recognition on one or more initial images of a document to produce initial optical character recognition results and displaying the initial optical character recognition results of the document to a user;
receiving a feedback from the user regarding an error location in the initial optical character recognition results, wherein the error location is a location of a misspelled character sequence in the initial optical character recognition results;
receiving an additional image of the document, wherein the additional image contains a portion of the document, which corresponds to the error location;
performing optical character recognition of the additional image to produce additional optical character recognition results;
identifying a cluster of character sequences corresponding to the error location by matching the initial optical character recognition results and the additional optical character recognition results;
performing, for each of the cluster of characters, a probability evaluation to determine a plurality of probability values for the cluster of characters;
identifying, based on the probability values, a character sequence of the cluster of character sequences as a corrected character sequence; and
displaying to the user modified optical character recognition results, which contain in the error location the corrected character sequence, wherein the corrected character sequence is different from the misspelled character sequence.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of verifying optical character recognition (OCR) results may involve: performing OCR on one or more initial images of a document and displaying initial OCR results of the document to a user; receiving a feedback from the user regarding an error location in the initial OCR results, the error location being a location of a misspelled character sequence; receiving an additional image of the document, which corresponds to the error location, and performing OCR of the additional image to produce additional OCR results; identifying a cluster of character sequences, which correspond to the error location, using the initial OCR results and the additional OCR results; identifying an order of character sequences in the cluster of character sequences based on their respective probability values; and displaying to the user modified optical character recognition results, which contain in the error location a corrected character sequence.
13 Citations
19 Claims
-
1. A method comprising:
-
performing optical character recognition on one or more initial images of a document to produce initial optical character recognition results and displaying the initial optical character recognition results of the document to a user; receiving a feedback from the user regarding an error location in the initial optical character recognition results, wherein the error location is a location of a misspelled character sequence in the initial optical character recognition results; receiving an additional image of the document, wherein the additional image contains a portion of the document, which corresponds to the error location; performing optical character recognition of the additional image to produce additional optical character recognition results; identifying a cluster of character sequences corresponding to the error location by matching the initial optical character recognition results and the additional optical character recognition results; performing, for each of the cluster of characters, a probability evaluation to determine a plurality of probability values for the cluster of characters; identifying, based on the probability values, a character sequence of the cluster of character sequences as a corrected character sequence; and displaying to the user modified optical character recognition results, which contain in the error location the corrected character sequence, wherein the corrected character sequence is different from the misspelled character sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
a memory; a display; and a processing device, which is coupled to the memory and the display, the processing device is configured to; perform optical character recognition on one or more initial images of a document to produce initial optical character recognition results and display the initial optical character recognition results to a user; receive a feedback from the user regarding an error location in the initial optical character recognition results, wherein the error location is a location of a misspelled character sequence in the initial optical character recognition results; receive an additional image of the document, wherein the additional image contains a portion of the document, which corresponds the error location; perform optical character recognition on the additional image to produce additional optical character recognition results; identify a cluster of character sequences corresponding to the error location by matching the initial optical character recognition results and the additional optical character recognition results; perform, for each of the cluster of characters, a probability evaluation to determine a plurality of probability values for the cluster of characters; identify, based on the probability values, a character sequence of the cluster of character sequences as a corrected character sequence; and display to the user modified optical character recognition results, which contain in the error location the corrected character sequence, wherein the corrected character sequence is different from the misspelled character sequence. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a processing device, cause the processing device to:
-
perform optical character recognition on one or more initial images of a document to produce initial optical character recognition results and display the initial optical character recognition results of the document to a user; receive a feedback from the user regarding an error location in the initial optical character recognition results, wherein the error location is a location of a misspelled character sequence in the initial optical character recognition results; receive an additional image of the document, wherein the additional image contains a portion of the document, which corresponds to the error location; perform optical character recognition on the additional image to produce additional optical character recognition results; identify a cluster of character sequences corresponding to the error location by matching the initial optical character recognition results and the additional optical character recognition results; perform, for each of the cluster of characters, a probability evaluation to determine a plurality of probability values for the cluster of characters; identify, based on the probability values, a character sequence of the cluster of character sequences as a corrected character sequence; and display to the user modified optical character recognition results, which contain in the error location the corrected character sequence, wherein the corrected character sequence is different from the misspelled character sequence.
-
Specification