×

Optical character recognition based on shape clustering and multiple optical character recognition processes

  • US 20080063279A1
  • Filed: 09/11/2006
  • Published: 03/13/2008
  • Est. Priority Date: 09/11/2006
  • Status: Active Grant
First Claim
Patent Images

1. A system for optical character recognition (OCR), comprising:

  • a plurality of OCR engines each operable to process an original image of a document and to produce a respective OCR output;

    a plurality of post-OCR processing engines each operable to receive an OCR output from a respective OCR engine and operable to produce a respective modified OCR output of the document; and

    a vote processing engine operable to select portions from the plurality of modified OCR outputs and to assemble the selected portions into a final OCR output for the document;

    wherein each post-OCR processing engine is operable to;

    classify clip images defined in a received OCR output for the document into a plurality of clusters of clip images, each cluster comprising clip images of similar image sizes and shapes that are assigned the same one or more particular characters by the corresponding OCR engine; and

    generate a cluster image to represent clip images in each cluster;

    and wherein the vote processing engine is operable to;

    use shape differences between a cluster image of each cluster and cluster images of other clusters to detect whether an error exists in the one or more particular characters assigned to each cluster by the corresponding OCR engine;

    correct each detected error in a particular cluster by newly assigning one or more particular characters to the particular cluster; and

    use the newly assigned one or more particular characters for the particular cluster to replace respective one or more particular characters previously assigned by the corresponding OCR engine in a corresponding modified OCR output.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×