×

Regional context maximum likelihood error correction for OCR, keyboard, and the like

  • US 3,969,700 A
  • Filed: 07/30/1975
  • Issued: 07/13/1976
  • Est. Priority Date: 04/10/1974
  • Status: Expired due to Term
First Claim
Patent Images

1. A data processing system for selecting the correct form of an input error word garbled by an OCR splitting error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of characters, comprising:

  • a storage means for storing said predetermined class of reference words, selected characters composing the reference words having stored in said storage means an error propensity indicium for indicating the propensity of the character to being misread through a splitting error,said storage means storing a first type conditional probability that a first character can be output by said OCR through character substitution, given that a second character was actually scanned, and a second type conditional probability that a pair of adjacent characters can be output by said OCR through character splitting, given that a third character was actually scanned;

    a first register means connected to an input line for storing the characters of said error word arranged in the sequence of receipt from said OCR, with a first character at a given end of said error word defining a first position for an error word origin;

    a second register means connected to said storage means for storing the characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin;

    decoding means connected to said second register for decoding the error propensity indicium corresponding to the character located at said reference word origin in said reference word;

    accessing means connected to said storage means for accessing from said storage means, when said decoded indicium indicates a character splitting propensity, a first one of said first type conditional probability that given the character located at said reference word origin in said reference word was scanned, that the OCR substituted the character located at said error word origin in said error word;

    said accessing means accessing from said storage means when said decoded indicium indicates a error splitting propensity, a second one of said first type conditional probability that given the character next to the character at said reference word origin in said reference word was scanned, that the OCR substituted the character next to the character located at said error word origin in said error word;

    multiplying means connected to said storage means for multiplying said first one and said second one of said first conditional probabilities, as a first product;

    said accessing means accessing from said storage means when said decoded indicium indicates a character splitting propensity, a first one of said second type conditional probability that given the character located at said reference word origin in said reference word was scanned, that the OCR split it into the character located at said error word origin and the character next to the character located at said error word origin in said error word;

    said accessing means accessing from said storage means when said decoded indicium indicates a character splitting propensity, a third one of said first type conditional probabilities that given the character next to the character located at said reference word origin in said reference word was scanned, that the OCR substituted the second next character to the character located at said error word origin in said error word;

    said multiplying means multiplying said first one of said second type probability and said third one of said first type conditional probability as a second product;

    comparison means connected to said multiplying means for comparing the relative magnitudes of said first and said second product;

    a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabilities if said second product is greater than said first product;

    a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one character position when said first probability product is greater than said second probability product;

    said shifting means shifting said error word origin by two character positions and shifting said reference word origin by one character position when said second probability product is greater than said first probability product;

    whereby the reference word stored in said storage means having the highest conditional probability of having been misread as the error word stored in said first register, can be determined.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×