×

System and Method for Correcting Low Confidence Characters From an OCR Engine With an HTML Web Form

  • US 20080212901A1
  • Filed: 03/03/2008
  • Published: 09/04/2008
  • Est. Priority Date: 03/01/2007
  • Status: Abandoned Application
First Claim
Patent Images

1. A computer implemented method for correcting low confidence characters from an optical character recognition (“

  • OCR”

    ) system, the method comprising;

    receiving from an OCR system an image of a source document and corresponding text data generated by the OCR system as a result of an OCR analysis;

    parsing the text data to identify a plurality of fields of text data, each field of text data comprising one or more characters of text data;

    parsing the text data to identify a confidence value for each character of text data;

    parsing the text data to identify an X-Y coordinate value for each field of text data and for each character of text data;

    populating a data structure with each field of text data, the X-Y coordinate value for each field of text data, the characters of text data corresponding to each field, the X-Y coordinate value for each character of text data and the confidence value for each character of text data;

    determining a low confidence character threshold;

    creating a hypertext markup language (“

    HTML”

    ) form comprising a plurality of individual field objects, wherein each individual field object includes one or more characters and wherein each character having a confidence value below the low confidence character threshold is identified as a stop position in a field object;

    displaying to an operator the HTML form;

    simultaneously displaying to the operator an image of a portion of the source document image;

    moving an input focus on the HTML form to a first stop position in a field object and visually emphasizing in the displayed HTML form the low confidence character corresponding to the first stop position;

    zooming the display of the source document image to the X-Y coordinate value associated with the low confidence character at the first stop position;

    receiving an input from the operator to move to another object;

    moving the input focus on the HTML form to a second stop position in a field object and visually emphasizing in the displayed HTML form the low confidence character corresponding to the second stop position; and

    zooming the display of the source document image to the X-Y coordinates associated with the second low-confidence object.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×