Shape clustering and cluster-level manual identification in post optical character recognition processing
First Claim
Patent Images
1. A computer-implemented method for processing output from an optical character recognition (OCR) process, comprising:
- classifying separated images in an output of the OCR process generated from processing an original image of a document into a plurality of clusters of separated images, each cluster comprising separated images of similar image sizes and shapes that are assigned the same one or more particular characters by the OCR process;
using a cluster image to represent separated images in a respective cluster;
selecting a cluster which has a low level of confidence to obtain a manual assignment of one or more characters with the cluster image of the selected cluster; and
using the one or more characters obtained by the manual assignment to verify or replace respective one or more particular characters previously assigned by the OCR process in the output of the OCR process,wherein the method is performed by one or more computer processors.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for shape clustering and applications in processing various documents, including an output of an optical character recognition (OCR) process.
-
Citations
24 Claims
-
1. A computer-implemented method for processing output from an optical character recognition (OCR) process, comprising:
-
classifying separated images in an output of the OCR process generated from processing an original image of a document into a plurality of clusters of separated images, each cluster comprising separated images of similar image sizes and shapes that are assigned the same one or more particular characters by the OCR process; using a cluster image to represent separated images in a respective cluster; selecting a cluster which has a low level of confidence to obtain a manual assignment of one or more characters with the cluster image of the selected cluster; and using the one or more characters obtained by the manual assignment to verify or replace respective one or more particular characters previously assigned by the OCR process in the output of the OCR process, wherein the method is performed by one or more computer processors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for optical character recognition (OCR), comprising:
-
an OCR engine operable to process an original image of a document to produce separated images extracted from the original image and assign one or more characters to each separated image; and a post-OCR engine operable to classify separated images in the OCR output into a plurality of clusters of separated images, each cluster comprising separated images of similar image sizes and shapes that are assigned the same one or more particular characters by the OCR engine, wherein the post-OCR engine is operable to generate a cluster image to represent separated images in a respective cluster, select a cluster which has a low level of confidence to obtain a manual assignment of one or more characters with the cluster image of the selected cluster, and use the one or more characters obtained by the manual assignment to verify or replace respective one or more particular characters previously assigned by the OCR engine; and one or more server computers that comprise the OCR engine and the post-OCR engine. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computer program product, encoded on a computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
-
classifying separated images in an output of an optical character recognition (OCR) process generated from processing an original image of a document into a plurality of clusters of separated images, each cluster comprising separated images of similar image sizes and shapes that are assigned the same one or more particular characters by the OCR process; using a cluster image to represent separated images in a respective cluster; selecting a cluster which has a low level of confidence to obtain a manual assignment of one or more characters with the cluster image of the selected cluster; and using the one or more characters obtained by the manual assignment to verify or replace respective one or more particular characters previously assigned by the OCR process in the output of the OCR process. - View Dependent Claims (18)
-
-
19. A computer-implemented method, comprising:
-
classifying clip images defined in a received OCR output of a document processed by an optical character recognition (OCR) process into a plurality of clusters of clip images, each cluster comprising clip images of similar image sizes and shapes that are assigned the same one or more particular characters by the OCR process; generating a cluster image to represent clip images in each cluster; selecting a cluster image of a particular cluster as part of an on-line challenge-response test to solicit a user identification of the cluster image of the particular cluster; and using the user identification received from the on-line challenge-response test to verify or correct one or more particular characters assigned to the particular cluster by the OCR process, wherein the method is performed by one or more computer processors. - View Dependent Claims (20, 21, 22)
-
-
23. A computer-implemented method, comprising:
-
classifying clip images defined in a received OCR output of a document processed by an optical character recognition (OCR) process into a plurality of clusters of clip images, each cluster comprising clip images of similar image sizes and shapes that are assigned the same one or more particular characters by the OCR process; using a cluster image to represent clip images in each cluster; using an on-line game to supply a cluster image of a particular cluster to one or more users of the on-line game for a user response as part of the on-line game; and using the user response received from the on-line game to verify or correct one or more particular characters assigned to the particular cluster by the OCR process, wherein the method is performed by one or more computer processors. - View Dependent Claims (24)
-
Specification