Optical character recognition based on shape clustering and multiple optical character recognition processes

US 7,650,035 B2
Filed: 09/11/2006
Issued: 01/19/2010
Est. Priority Date: 09/11/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A system for optical character recognition (OCR), comprising:

a plurality of OCR engines each operable to process an original image of a document and to produce a respective OCR output;

a plurality of post-OCR processing engines each operable to receive an OCR output from a respective OCR engine and operable to produce a respective modified OCR output of the document; and

a vote processing engine operable to select portions from the plurality of modified OCR outputs and to assemble the selected portions into a final OCR output for the document;

wherein each post-OCR processing engine is operable to;

classify clip images defined in a received OCR output for the document into a plurality of clusters of clip images, each cluster comprising clip images of similar image sizes and shapes that are assigned the same one or more particular characters by the corresponding OCR engine; and

generate a cluster image to represent clip images in each cluster;

and wherein the vote processing engine is operable to;

use shape differences between a cluster image of each cluster and cluster images of other clusters to detect whether an error exists in the one or more particular characters assigned to each cluster by the corresponding OCR engine;

correct each detected error in a particular cluster by newly assigning one or more particular characters to the particular cluster; and

use the newly assigned one or more particular characters for the particular cluster to replace respective one or more particular characters previously assigned by the corresponding OCR engine in a corresponding modified OCR output.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for shape clustering and applications in processing various documents, including an output of an optical character recognition (OCR) process.

Citations

39 Claims

1. A system for optical character recognition (OCR), comprising:
- a plurality of OCR engines each operable to process an original image of a document and to produce a respective OCR output;
  
  a plurality of post-OCR processing engines each operable to receive an OCR output from a respective OCR engine and operable to produce a respective modified OCR output of the document; and
  
  a vote processing engine operable to select portions from the plurality of modified OCR outputs and to assemble the selected portions into a final OCR output for the document;
  
  wherein each post-OCR processing engine is operable to;
  
  classify clip images defined in a received OCR output for the document into a plurality of clusters of clip images, each cluster comprising clip images of similar image sizes and shapes that are assigned the same one or more particular characters by the corresponding OCR engine; and
  
  generate a cluster image to represent clip images in each cluster;
  
  and wherein the vote processing engine is operable to;
  
  use shape differences between a cluster image of each cluster and cluster images of other clusters to detect whether an error exists in the one or more particular characters assigned to each cluster by the corresponding OCR engine;
  
  correct each detected error in a particular cluster by newly assigning one or more particular characters to the particular cluster; and
  
  use the newly assigned one or more particular characters for the particular cluster to replace respective one or more particular characters previously assigned by the corresponding OCR engine in a corresponding modified OCR output.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein:
    - the clusters include (1) clusters in which each clip image is associated with a single bounding box produced by a respective OCR engine and (2) clusters in which each clip image is associated with two or more adjacent bounding boxes produced by a respective OCR engine.
  - 3. The system of claim 1, wherein:
    - the plurality of OCR engines are operable to process the original image in parallel; and
      
      the plurality of post-OCR processing engines are operable to receive OCR outputs in parallel.
  - 4. The system of claim 1, wherein:
    - the plurality of OCR engines are operable to process the original image serially.
  - 5. The system of claim 1, further comprising:
    - one or more server computers that comprise the OCR engines, the post-OCR engines and the vote processing engine; and
      
      a communication network with which the one or more computer servers are in communication, the communication network operable to direct the original image of the document from a client computer to the OCR engines and to direct the final OCR output from the vote processing engine to the client computer.
  - 6. The system of claim 5, wherein:
    - the OCR engines, the post-OCR engines and the vote processing engine are on different server computers, respectively.
  - 7. The system of claim 1, further comprising:
    - one or more server computers that comprise the OCR engines, the post-OCR engines and the vote processing engine;
      
      a communication network with which the one or more computer servers are in communication; and
      
      one or more OCR storage server computers that are in communication with the communication network and store modified OCR outputs for images of selected documents produced by the OCR engines, the post-OCR engines and the vote processing engine,wherein the communication network provides communications between a client computer and the one or more OCR storage server computers to allow the client computer to retrieve from the one or more OCR storage server computers an existing modified OCR output.

8. A method for optical character recognition (OCR), comprising:
- using a plurality of OCR engines to process an original image of a document and to produce a plurality of OCR outputs, respectively;
  
  processing each of the OCR outputs separately from processing other OCR output to produce a respective modified OCR output of the document, the processing including;
  
  classifying clip images defined in a received OCR output for the document into a plurality of clusters of clip images, each cluster comprising clip images of similar image sizes and shapes that are assigned the same one or more particular characters by the corresponding OCR engine,generating a cluster image to represent clip images in each cluster,using shape differences between a cluster image of each cluster and cluster images of other clusters to detect whether an error exists in the one or more particular characters assigned to each cluster by the corresponding OCR engine,correcting each detected error in a particular cluster by newly assigning one or more particular characters to the particular cluster, andusing the newly assigned one or more particular characters for the particular cluster to replace respective one or more particular characters previously assigned by the corresponding OCR engine in a corresponding modified OCR output; and
  
  selecting portions from the plurality of modified OCR outputs and to assemble the selected portions into a final OCR output for the document.
- View Dependent Claims (9, 10, 11)
- - 9. The method of claim 8, further comprising:
    - using confidence scores of the plurality of modified OCR outputs to select the portions from the plurality of modified OCR outputs.
  - 10. The method of claim 8, wherein:
    - the processing of each of the OCR outputs comprises using a manual identification of a cluster image to verify or correct an assignment of one or more characters for the cluster image.
  - 11. The method of claim 8, wherein:
    - the processing of each of the OCR outputs comprises using gray scale or color data from the original image in each clip image in each cluster; and
      
      averaging clip images in each cluster to produce an averaged clip image as the cluster image.

12. A computer program product, encoded on a computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
- using a plurality of optical character recognition (OCR) engines to process an original image of a document and to produce a plurality of OCR outputs, respectively;
  
  processing each of the OCR outputs separately from processing other OCR output to produce a respective modified OCR output of the document, the processing including;
  
  classifying clip images defined in a received OCR output for the document into a plurality of clusters of clip images, each cluster comprising clip images of similar image sizes and shapes that are assigned the same one or more particular characters by the corresponding OCR engine,generating a cluster image to represent clip images in each cluster,using shape differences between a cluster image of each cluster and cluster images of other clusters to detect whether an error exists in the one or more particular characters assigned to each cluster by the corresponding OCR engine,correcting each detected error in a particular cluster by newly assigning one or more particular characters to the particular cluster, andusing the newly assigned one or more particular characters for the particular cluster to replace respective one or more particular characters previously assigned by the corresponding OCR engine in a corresponding modified OCR output; and
  
  selecting portions from the plurality of modified OCR outputs and to assemble the selected portions into a final OCR output for the document.

13. A method, comprising:
- processing a document image with a first optical character recognition (OCR) engine to generate first OCR output, the first OCR output comprising first bounding boxes identifying first clip images located in the document image and respective one or more characters assigned to each first clip image;
  
  processing the document image with a second OCR engine to generate second OCR output, the second OCR output comprising second bounding boxes identifying second clip images located in the document image and respective one or more characters assigned to each second clip image;
  
  applying shape clustering to the first OCR output to produce first clusters with first clip images and a respective confidence score for each assignment of one or more characters to a first clip image;
  
  applying shape clustering to the second OCR output to produce second clusters with second clip images and a respective confidence score for each assignment of one or more characters to a second clip image; and
  
  generating a final OCR output from the first OCR output and the second OCR output, the final OCR output comprising bounding boxes and using the confidence scores for assignments of the one or more characters to the first clip images and the second clip images to select and assign respective one or more characters to each of the bounding boxes.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method of claim 13, wherein:
    - the clusters include (1) clusters in which each clip image is associated with a single bounding box produced by a respective OCR engine and (2) clusters in which each clip image is associated with two or more adjacent bounding boxes produced by a respective OCR engine.
  - 15. The method of claim 13, further comprising:
    - processing the document image with at least a third OCR engine to generate a third OCR output, the third OCR output comprising third bounding boxes identifying third clip images located in the document image and respective one or more characters assigned to each third clip image, andwherein generating the final OCR output comprises using the first, the second and the third OCR outputs and using the confidence scores of assignments of characters to the first, the second and the third clip images to select and assign respective one or more characters to each of the bounding boxes in the final OCR output.
  - 16. The method of claim 13, wherein:
    - shape clustering assigns respective one or more characters to each cluster of a plurality of clusters, each cluster including one or more clip images; and
      
      applying shape clustering comprises accessing an original document image and retrieving gray scale or color data to confirm or modify the assignment of characters to clusters.
  - 17. The method of claim 13, further comprising:
    - prior to generating the final OCR output, processing the first clusters to modify or verify assignment of characters to the first clip images in the first OCR output; and
      
      prior to generating the final OCR output, processing the second clusters to modify or verify assignment of characters to the second clip images in the second OCR output.

18. A computer program product, encoded on a computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
- processing a document image with a first optical character recognition (OCR) engine to generate first OCR output, the first OCR output comprising first bounding boxes identifying first clip images located in the document image and respective one or more characters assigned to each first clip image;
  
  processing the document image with a second OCR engine to generate second OCR output, the second OCR output comprising second bounding boxes identifying second clip images located in the document image and respective one or more characters assigned to each second clip image;
  
  applying shape clustering to the first OCR output to produce first clusters with first clip images and a respective confidence score for each assignment of one or more characters to a first clip image;
  
  applying shape clustering to the second OCR output to produce second clusters with second clip images and a respective confidence score for each assignment of one or more characters to a second clip image; and
  
  generating a final OCR output from the first OCR output and the second OCR output, the final OCR output comprising bounding boxes and using the confidence scores for assignments of the one or more characters to the first clip images and the second clip images to select and assign respective one or more characters to each of the bounding boxes.

19. A system for optical character recognition (OCR), comprising:
- a first OCR engine operable to process a document image to generate first OCR output, the first OCR output comprising first bounding boxes identifying first clip images located in the document image and respective one or more characters assigned to each first clip image;
  
  a first post-OCR engine operable to apply shape clustering to the first OCR output to produce first clusters with first clip images and a respective confidence score for each assignment of one or more characters to a first clip image;
  
  a second OCR engine operable to process the document image to generate second OCR output, the second OCR output comprising second bounding boxes identifying second clip images located in the document image and respective one or more characters assigned to each second clip image;
  
  a second post-OCR engine operable to apply shape clustering to the second OCR output to produce second clusters with second clip images and a respective confidence score for each assignment of one or more characters to a second clip image; and
  
  a vote processing engine to receive and process the first OCR output and the second OCR output and to produce a final OCR output from the first and second clusters in based on confidence scores.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The system of claim 19, wherein:
    - the first post-OCR engine is operable to obtain a manual identification of a cluster image to verify or correct an assignment of one or more characters for the cluster image.
  - 21. The system of claim 19, further comprising:
    - one or more server computers that comprise the first and second OCR engines, the first and second post-OCR engines and the vote processing engine; and
      
      a communication network with which the one or more computer servers are in communication, the communication network operable to direct the original image of the document from a client computer to the first and second OCR engines and to direct the final OCR output from vote processing engine to the client computer.
  - 22. The system of claim 21, wherein:
    - the first and second OCR engines, the first and second post-OCR engines and the vote processing engine are on different server computers, respectively.
  - 23. The system of claim 19, further comprising:
    - one or more server computers that comprise the first and second OCR engines, the first and second post-OCR engines and the vote processing engine;
      
      a communication network with which the one or more computer servers are in communication; and
      
      one or more OCR storage server computers that are in communication with the communication network and store final OCR outputs for images of selected documents produced by the first and second OCR engines, the first and second post-OCR engines and the vote processing engine,wherein the communication network provides communications between a client computer and the one or more OCR storage server computers to allow the client computer to retrieve from the one or more OCR storage server computers an existing final OCR output.

24. A method, comprising:
- processing a document image with a first optical character recognition (OCR) engine to generate first OCR output, the first OCR output comprising first bounding boxes identifying first clip images located in the document image, the first OCR output further comprising a respective one or more characters assigned to each first clip image;
  
  processing the document image with a second OCR engine to generate second OCR output, the second OCR output comprising second bounding boxes identifying second clip images located in the document image, the second OCR output further comprising a respective one or more characters assigned to each second clip image;
  
  classifying the first clip images and the second clip images into clusters, each cluster including only clip images having the same one or more characters assigned to the clip image;
  
  generating a cluster image for each cluster;
  
  using the cluster images to verify or correct the assignment of characters to clip images and determine a confidence score for each assignment of one or more characters to a clip image; and
  
  using the assignments of characters to the cluster images to generate a final OCR output.
- View Dependent Claims (25, 26, 27)
- - 25. The method of claim 24, wherein:
    - the cluster image for a cluster is generated by averaging clip images in the cluster.
  - 26. The method of claim 24, further comprising:
    - in generating the final OCR output, determining if any one of the first clip images shares a location in the document image with any one of the second clip images and the one or more characters assigned to the one first clip image are different from the one or more characters assigned to the one second clip image, and if so, using the respective confidence scores for the one first clip image and the one second clip image to select one or more characters for the location.
  - 27. The method of claim 24, wherein:
    - the clusters include (1) clusters in which each clip image is associated with a single bounding box produced by a respective OCR engine and (2) clusters in which each clip image is associated with two or more adjacent bounding boxes produced by a respective OCR engine.

28. A computer program product, encoded on a computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
- processing a document image with a first optical character recognition (OCR) engine to generate first OCR output, the first OCR output comprising first bounding boxes identifying first clip images located in the document image, the first OCR output further comprising a respective one or more characters assigned to each first clip image;
  
  processing the document image with a second OCR engine to generate second OCR output, the second OCR output comprising second bounding boxes identifying second clip images located in the document image, the second OCR output further comprising a respective one or more characters assigned to each second clip image;
  
  classifying the first clip images and the second clip images into clusters, each cluster including only clip images having the same one or more characters assigned to the clip image;
  
  generating a cluster image for each cluster;
  
  using the cluster images to verify or correct the assignment of characters to clip images and determine a confidence score for each assignment of one or more characters to a clip image; and
  
  using the assignments of characters to the cluster images to generate a final OCR output.

29. A system for optical character recognition (OCR), comprising:
- a first OCR engine operable to process a document image to generate first OCR output, the first OCR output comprising first bounding boxes identifying first clip images located in the document image, the first OCR output further comprising a respective one or more characters assigned to each first clip image;
  
  a second OCR engine operable to process the document image to generate second OCR output, the second OCR output comprising second bounding boxes identifying second clip images located in the document image, the second OCR output further comprising a respective one or more characters assigned to each second clip image;
  
  a post-OCR engine to receive the first and second OCR outputs and to classify the first clip images and the second clip images into clusters, each cluster including only clip images having the same one or more characters assigned to the clip image and a cluster image representing clip images for each cluster; and
  
  a vote processing engine operable to generate a final OCR output based on assignments of characters to the cluster images from the post-OCR engine.
- View Dependent Claims (30, 31, 32)
- - 30. The system of claim 29, further comprising:
    - one or more server computers that comprise the first and second OCR engines, the post-OCR engine and the vote processing engine; and
      
      a communication network with which the one or more computer servers are in communication, the communication network operable to direct the original image of the document from a client computer to the first and second OCR engines and to direct the final OCR output from vote processing engine to the client computer.
  - 31. The system of claim 30, wherein:
    - the first and second OCR engines, the post-OCR engine and the vote processing engine are on different server computers, respectively.
  - 32. The system of claim 29, further comprising:
    - one or more server computers that comprise the first and second OCR engines, the post-OCR engine and the vote processing engine;
      
      a communication network with which the one or more computer servers are in communication; and
      
      one or more OCR storage server computers that are in communication with the communication network and store final OCR outputs for images of selected documents produced by the first and second OCR engines, the post-OCR engine and the vote processing engine,wherein the communication network provides communications between a client computer and the one or more OCR storage server computers to allow the client computer to retrieve from the one or more OCR storage server computers an existing final OCR output.

33. A method, comprising:
- processing a document image with a first optical character recognition (OCR) engine to generate first OCR output, the first OCR output comprising bounding boxes identifying clip images located in the document image and a character assignment assigning one or more characters to each clip image;
  
  applying shape clustering to the first OCR output to produce a first modified OCR output, the first modified OCR output comprising a modification of the assignment of characters to clip images, the first modified OCR output further comprising words recognized in the document image;
  
  identifying a suspect word in the first modified OCR output, the suspect word being a word having a character identified as a suspect character; and
  
  processing the suspect word with a second OCR engine to recognize the suspect word.
- View Dependent Claims (34, 35, 36)
- - 34. The method of claim 33, further comprising:
    - selecting the first modified OCR output or the output of the second OCR engine as correctly recognizing the suspect word.
  - 35. The method of claim 33, further comprising:
    - applying shape clustering to produce a respective confidence score for each assignment of one or more characters to a clip image in the first modified OCR output;
      
      processing the document image with the second OCR engine to generate second OCR output;
      
      applying shape clustering to the second OCR output to produce a second modified OCR output, the second modified OCR output comprising a modification of the assignment of characters to clip images by the second OCR engine, the second modified OCR output comprising a respective confidence score for each assignment of one or more characters to a clip image, the second modified OCR output further comprising words recognized in the document image; and
      
      using the confidence scores of the first modified OCR output and the confidence scores of the second modified OCR output to select the first modified OCR output or the output of the second OCR engine as correctly recognizing the suspect word.
  - 36. The method of claim 35, wherein:
    - applying shape clustering to the second OCR output comprises;
      
      classifying the clip images located by the second OCR engine into clusters, each cluster including only clip images having the same one or more characters assigned by the second OCR engine;
      
      generating a cluster image for each cluster, the cluster image for a cluster being generated by averaging clip images in the cluster;
      
      using the cluster images to generate a corrected assignment of characters to the clip images; and
      
      using the corrected assigned characters of the corrected assignment to recognize words.

37. A computer program product, encoded on a computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
- processing a document image with a first optical character recognition (OCR) engine to generate first OCR output, the first OCR output comprising bounding boxes identifying clip images located in the document image and a character assignment assigning one or more characters to each clip image;
  
  applying shape clustering to the first OCR output to produce a first modified OCR output, the first modified OCR output comprising a modification of the assignment of characters to clip images, the first modified OCR output further comprising words recognized in the document image;
  
  identifying a suspect word in the first modified OCR output, the suspect word being a word having a character identified as a suspect character; and
  
  processing the suspect word with a second OCR engine to recognize the suspect word.

38. A system for optical character recognition (OCR), comprising:
- a first OCR engine operable to process a document image to generate first OCR output, the first OCR output comprising bounding boxes identifying clip images located in the document image and a character assignment assigning one or more characters to each clip image;
  
  a first post-OCR engine operable to apply shape clustering to the first OCR output to produce a first modified OCR output, the first modified OCR output comprising a modification of the assignment of characters to clip images, the first modified OCR output further comprising words recognized in the document image, wherein the first post-OCR engine is operable to identify a suspect word in the first modified OCR output, the suspect word being a word having a character identified as a suspect character; and
  
  a second OCR engine operable to receive and process the suspect word to recognize the suspect word.
- View Dependent Claims (39)
- - 39. The system of claim 38, further comprising:
    - one or more server computers that comprise the first and second OCR engines and the first post-OCR engine; and
      
      a communication network with which the one or more computer servers are in communication, the communication network operable to direct the original image of the document from a client computer to the first OCR engine and to direct an OCR output from the second OCR engine to the client computer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Smith, Raymond W., Vincent, Luc
Primary Examiner(s)
Mariam; Daniel G

Application Number

US11/519,376
Publication Number

US 20080063279A1
Time in Patent Office

1,226 Days
Field of Search

382/209, 382/218, 382/224, 382/225, 382/321, 382/203
US Class Current

382/225
CPC Class Codes

G06F 18/254   of classification results, ...

G06V 30/10   Character recognition

G06V 30/127   with the intervention of an...

G06V 30/1918   Fusion techniques, i.e. com...

G06V 30/414   Extracting the geometrical ...

Optical character recognition based on shape clustering and multiple optical character recognition processes

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Optical character recognition based on shape clustering and multiple optical character recognition processes

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links