×

Confusion set based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique

  • US 6,205,261 B1
  • Filed: 02/05/1998
  • Issued: 03/20/2001
  • Est. Priority Date: 02/05/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of recognizing at least one word in a document, the word including at least one predetermined character member, the method comprising the steps of:

  • a) providing a recognized word based on the word in the document;

    b) determining whether the recognized word has been misrecognized;

    c) providing, if the recognized word has been misrecognized, a set of reference words, each reference word comprising a different set of predetermined character members;

    d) providing a plurality of confusion sets, each confusion set grouping together a different plurality of character members, a content of each confusion set being independent of the recognized word;

    e) comparing at least one character sequence of the misrecognized word with a corresponding character sequence of a current one of the set of reference words to determine which corresponding character sequences do not include the same character members;

    f) eliminating the current reference word if the character member of any character sequence of the misrecognized word does not correspond to the character member of the corresponding character sequence in the current reference word and if the character members from the corresponding character sequences in the misrecognized word and the current reference word are not from the same confusion set;

    g) repeating steps e) and f) for each reference word of the set of reference words, the remaining non-eliminated reference words comprising a set of candidate reference words; and

    h) selecting one of the set of candidate reference words in accordance with a set of predetermined criteria, the selected candidate reference word comprising a replacement word for the misrecognized word.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×