×

OCR error correction methods and apparatus utilizing contextual comparison

  • US 5,850,480 A
  • Filed: 05/30/1996
  • Issued: 12/15/1998
  • Est. Priority Date: 05/30/1996
  • Status: Expired due to Fees
First Claim
Patent Images

1. For use with a document processing system having an optical character recognition device for scanning documents with one or more discrete alphanumeric characters collectively forming an alphanumeric character string contained in a field having a number of character positions, the document processing system also having a memory with a lexicon of character strings wherein at least a portion of all of the possible alphanumeric character strings are listed in the lexicon as lexicon strings, the document processing system also having a recognition engine for generating at least one phantom character data table consisting of a set of cognate pairs of phantom characters and associated confidence values for each position of the field, a method of selecting the lexicon string which most accurately represents an alphanumeric character string contained within the field, said method comprising the steps of:

  • receiving at least one phantom character data table from the recognition engine;

    generating a numeric value for each of at least some of the lexicon strings, wherein each numeric value relates to the probability that its associated lexicon string accurately represents the alphanumeric character string contained within the field, and wherein each numeric value results from mathematical combination of the confidence values associated with each phantom character which matches a lexicon character within a predetermined number of positions of the corresponding position of the lexicon string, if none of the phantom characters received for a given position of the alphanumeric character string matches a lexicon character within the predetermined number of positions of the corresponding position of the lexicon string, a predetermined default confidence value is substituted for the phantom character confidence value in the mathematical combinations;

    comparing the resulting numeric values generated for each lexicon string; and

    selecting the lexicon string having a resulting associated numeric value indicating that the selected lexicon string most accurately represents the alphanumeric character string contained within the field.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×