×

Shape clustering in post optical character recognition processing

  • US 20080063276A1
  • Filed: 09/08/2006
  • Published: 03/13/2008
  • Est. Priority Date: 09/08/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • classifying clip images defined in a received OCR output of a document processed by an optical character recognition (OCR) process into a plurality of clusters of clip images, each cluster including clip images that are assigned the same one or more characters codes by the OCR process;

    processing clip images in each of the plurality of clusters to generate a cluster image for each cluster;

    comparing the cluster images to detect clusters to which one or more OCR character codes were erroneously assigned by the OCR process;

    assigning one or more new OCR character codes to a first cluster that is detected to have an erroneously assigned one or more OCR character codes in the OCR output; and

    using the one or more new OCR character codes to replace the erroneously assigned OCR character code at each occurrence of one of the clip images of the first cluster in the OCR output to produce a modified OCR output.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×