×

Shape clustering in post optical character recognition processing

  • US 8,170,351 B2
  • Filed: 07/21/2011
  • Issued: 05/01/2012
  • Est. Priority Date: 09/08/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A system for processing an optical character recognition (OCR) output including separated images produced by an OCR process in processing an original image of a document and one or more characters assigned to each separated image by the OCR process, comprising:

  • one or more computers, the one or more computers implementing;

    a cluster generation engine operable to classify the separated images in the OCR output into a plurality of clusters of separated images that are of a particular image size and are assigned the same one or more OCR character codes by the OCR process; and

    a cluster processing engine operable to determine shape metric distances between a cluster image of a cluster and cluster images of other clusters, wherein each cluster image is representative of separated images in each cluster, and to detect whether an error exists in assignment of one or more OCR character codes assigned to each cluster by the OCR process based on the shape metric distances,wherein the cluster processing engine is further operable to correct one or more erroneously assigned OCR character codes for a first cluster including replacing one or more first OCR character codes assigned to the first cluster with one or more second OCR character codes assigned to a second cluster as new one or more OCR character codes for the first cluster when the second cluster has a shortest shape metric distance from the first cluster among all other clusters and the second cluster has a higher level of confidence than the first cluster.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×