×

Shape Clustering in Post Optical Character Recognition Processing

  • US 20100232719A1
  • Filed: 05/20/2010
  • Published: 09/16/2010
  • Est. Priority Date: 09/08/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method for post optical character recognition (OCR) processing, comprising:

  • classifying clip images defined in a received OCR output of a document processed by an optical character recognition (OCR) process into a plurality of clusters of clip images, each cluster including clip images that are identical or similar in size and are assigned the same one or more characters codes by the OCR process;

    processing clip images in each of the plurality of clusters to generate a cluster image for each cluster;

    for a first cluster assigned one or more first OCR character codes, identifying (1) a second cluster assigned one or more second OCR character codes different from the one or more first OCR character codes, where the cluster image of the second cluster is closer in shape to a cluster image of the first cluster than to cluster images of other clusters assigned one or more OCR characters different from the one or more first OCR character codes, and (2) a third cluster assigned the same one or more first OCR character codes as the first cluster, where the cluster image of the third cluster is closer in shape to the cluster image of the first cluster than to the cluster images of other clusters assigned the one or more first OCR character codes; and

    using at least shape differences between the cluster images of the first cluster and the second cluster and between the cluster images of the first cluster and the third cluster to determine a level of confidence in the one or more first OCR character codes assigned to the first cluster.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×