×

Shape clustering in post optical character recognition processing

  • US 8,175,394 B2
  • Filed: 09/08/2006
  • Issued: 05/08/2012
  • Est. Priority Date: 09/08/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • classifying clip images defined in a received OCR output of a document processed by an optical character recognition (OCR) process into a plurality of clusters of clip images, each cluster including clip images that are assigned the same one or more character codes by the OCR process, wherein the OCR process performed on one or more computers generates the received OCR output;

    processing clip images in each of the plurality of clusters to generate exactly one cluster image for each cluster;

    classifying a first cluster in the plurality of clusters as a suspect cluster;

    identifying a nearest cluster to the suspect cluster, the nearest cluster being nearest based on a shape distance between the cluster image for the nearest cluster and the cluster image for the suspect cluster;

    replacing the one or more character codes assigned to the suspect cluster with character codes assigned to the nearest cluster at each occurrence of one of the clip images of the suspect cluster in the OCR output to produce a modified OCR output;

    directing a cluster image of a selected cluster to an on-line server which is operable to direct the cluster image to one or more users for manual identification of the cluster image, including using an on-line game provided by the on-line server to supply the cluster image of the selected cluster to the one or more users for a user response as part of the on-line game; and

    using manual identification of the cluster image returned from the on-line server to verify one or more OCR character codes assigned to the selected cluster or assign new one or more OCR character codes to the selected cluster.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×