Regional context maximum likelihood error correction for OCR, keyboard, and the like
First Claim
1. A data processing system for selecting the correct form of an input error word garbled by an OCR splitting error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of characters, comprising:
- a storage means for storing said predetermined class of reference words, selected characters composing the reference words having stored in said storage means an error propensity indicium for indicating the propensity of the character to being misread through a splitting error,said storage means storing a first type conditional probability that a first character can be output by said OCR through character substitution, given that a second character was actually scanned, and a second type conditional probability that a pair of adjacent characters can be output by said OCR through character splitting, given that a third character was actually scanned;
a first register means connected to an input line for storing the characters of said error word arranged in the sequence of receipt from said OCR, with a first character at a given end of said error word defining a first position for an error word origin;
a second register means connected to said storage means for storing the characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin;
decoding means connected to said second register for decoding the error propensity indicium corresponding to the character located at said reference word origin in said reference word;
accessing means connected to said storage means for accessing from said storage means, when said decoded indicium indicates a character splitting propensity, a first one of said first type conditional probability that given the character located at said reference word origin in said reference word was scanned, that the OCR substituted the character located at said error word origin in said error word;
said accessing means accessing from said storage means when said decoded indicium indicates a error splitting propensity, a second one of said first type conditional probability that given the character next to the character at said reference word origin in said reference word was scanned, that the OCR substituted the character next to the character located at said error word origin in said error word;
multiplying means connected to said storage means for multiplying said first one and said second one of said first conditional probabilities, as a first product;
said accessing means accessing from said storage means when said decoded indicium indicates a character splitting propensity, a first one of said second type conditional probability that given the character located at said reference word origin in said reference word was scanned, that the OCR split it into the character located at said error word origin and the character next to the character located at said error word origin in said error word;
said accessing means accessing from said storage means when said decoded indicium indicates a character splitting propensity, a third one of said first type conditional probabilities that given the character next to the character located at said reference word origin in said reference word was scanned, that the OCR substituted the second next character to the character located at said error word origin in said error word;
said multiplying means multiplying said first one of said second type probability and said third one of said first type conditional probability as a second product;
comparison means connected to said multiplying means for comparing the relative magnitudes of said first and said second product;
a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabilities if said second product is greater than said first product;
a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one character position when said first probability product is greater than said second probability product;
said shifting means shifting said error word origin by two character positions and shifting said reference word origin by one character position when said second probability product is greater than said first probability product;
whereby the reference word stored in said storage means having the highest conditional probability of having been misread as the error word stored in said first register, can be determined.
0 Assignments
0 Petitions
Accused Products
Abstract
A data processing system is disclosed for selecting the correct form of a garbled input word misread by an optical character reader so as to change the number of characters in the word by character splitting or concatenation. Dictionary words are stored in the system, having characters which are flagged for segmentation or concatenation OCR misread propensity. The OCR word and a dictionary word are loaded into a pair of associated shift registers, aligning their letters on one end. The dictionary word characters are inspected for error propensity flags. When a splitting propensity, for example, is found for a character, special conductional probability values are accessed from a storage and a calculation is performed of the probability that the first character of the dictionary word was split by the OCR into the first and second characters of the OCR word. This regional context probability is compared with the probability of a simple substitution error for the characters. If the probability of segmentation is larger, the OCR characters in the first shift register are shifted one space with respect to the dictionary word characters in the second shift register so that subsequent character pairs to be compared are properly matched. The greater calculated probability is combined in a running product. The dictionary word with the largest running product is output by the system as the most likely correct form of the garbled OCR input word.
In addition to optical character recognition, the system disclosed may be applied to correcting segmentation errors in phoneme-characters output from a speech analyzer.
In addition to optical character recognition, the system disclosed may be applied to correcting character substitutions, transpositions, additions, and omissions inadvertently typed on a keyboard.
-
Citations
39 Claims
-
1. A data processing system for selecting the correct form of an input error word garbled by an OCR splitting error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of characters, comprising:
-
a storage means for storing said predetermined class of reference words, selected characters composing the reference words having stored in said storage means an error propensity indicium for indicating the propensity of the character to being misread through a splitting error, said storage means storing a first type conditional probability that a first character can be output by said OCR through character substitution, given that a second character was actually scanned, and a second type conditional probability that a pair of adjacent characters can be output by said OCR through character splitting, given that a third character was actually scanned; a first register means connected to an input line for storing the characters of said error word arranged in the sequence of receipt from said OCR, with a first character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity indicium corresponding to the character located at said reference word origin in said reference word; accessing means connected to said storage means for accessing from said storage means, when said decoded indicium indicates a character splitting propensity, a first one of said first type conditional probability that given the character located at said reference word origin in said reference word was scanned, that the OCR substituted the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a error splitting propensity, a second one of said first type conditional probability that given the character next to the character at said reference word origin in said reference word was scanned, that the OCR substituted the character next to the character located at said error word origin in said error word; multiplying means connected to said storage means for multiplying said first one and said second one of said first conditional probabilities, as a first product; said accessing means accessing from said storage means when said decoded indicium indicates a character splitting propensity, a first one of said second type conditional probability that given the character located at said reference word origin in said reference word was scanned, that the OCR split it into the character located at said error word origin and the character next to the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a character splitting propensity, a third one of said first type conditional probabilities that given the character next to the character located at said reference word origin in said reference word was scanned, that the OCR substituted the second next character to the character located at said error word origin in said error word; said multiplying means multiplying said first one of said second type probability and said third one of said first type conditional probability as a second product; comparison means connected to said multiplying means for comparing the relative magnitudes of said first and said second product; a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabilities if said second product is greater than said first product; a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one character position when said first probability product is greater than said second probability product; said shifting means shifting said error word origin by two character positions and shifting said reference word origin by one character position when said second probability product is greater than said first probability product; whereby the reference word stored in said storage means having the highest conditional probability of having been misread as the error word stored in said first register, can be determined. - View Dependent Claims (2, 3)
-
-
4. A data processing system for selecting the correct form of an input error word garbled by an OCR concatenation error, the correct form of the error word being a member of the predetermined class of reference words, each comprising a plurality of characters, comprising:
-
a storage means for storing said predetermined class of reference words in a storage means, selected characters composing the words in said class having stored in said storage means an error propensity indicium for indicating the propensity of the character to be misread through a concatenation error; said storage means storing a first type conditional probability that a first character can be output by said OCR through character substitution, given that a second character was actually scanned, and a second type conditional probability that a first character can be output by said OCR through character concatenation, given that a pair of adjacent characters were actually scanned; a first register means connected to an input line for storing the characters of said error word in a first register, arranged in a sequence of receipt from said OCR, with a first character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity indicium corresponding to the character located at said reference word origin in said first reference word; accessing means connected to said storage means for accessing from said storage means when said decoded indicium indicates a character concatenation propensity, a first one of said first type conditional probability that given the character located at said first reference word origin of said reference word was scanned, that the OCR substituted the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a character concatenation propensity, a second one of said first type conditional probability that given the character next to the character located at said reference word origin in said reference word was scanned, that the OCR substituted the character next to the character located at said error word origin in said error word; multiplying means connected to said storage means for multiplying said first one and said second one of said first type conditional probabilities as a first product; said accessing means accessing from said storage means when said decoded indicium indicates a character concatenation propensity, a first one of said second type conditional probability that given the character located at said reference word origin and the character next to the character located at said reference word origin in said reference word were scanned, that the OCR concatenated them into the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a character concatenation propensity, a third one of said first type conditional probabilities that given the character second next to the character located at said reference word origin and said reference word was scanned, that the OCR substituted the character next to the character located at said error word origin in said error word; said multiplying means multiplying said first one of said second type conditional probability and said third one of said first type conditional probability as a second product; comparison means connected to said multiplying means for comparing the relative magnitude of said first and said second product; a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabilities if said second product is greater than said first product; a shifting means connected to said comparison means for shifting said error word origin and said reference word origin by one character position when said first probability product is greater than said second probability product; said shifting means shifting the error word origin by one character position and the reference word origin by two character positions when said second probability product is greater than said first probability product; whereby the reference word having the greatest total conditional probability that the error word was output by the OCR given that the reference word was scanned, can be determined. - View Dependent Claims (5, 6, 18, 20, 21, 23, 24, 26, 27, 29, 30, 32, 33, 35, 36)
-
-
7. A data processing system for selecting the correct form of an input error word garbled by an OCR crowding error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of characters, comprising:
-
a storage means for storing said predetermined class of reference words in a storage means, selected characters composing the reference word having stored in said storage means an error propensity indicium for indicating the propensity of the character to being misread through a crowding error; said storage means storing a first type conditional probability that a first character can be output by said OCR through character substitution, given that a second character was actually scanned, and a second type conditional probability that a first pair of adjacent characters can be output by said OCR through character crowding, given that a second pair of adjacent characters was actually scanned; a first register means connected to an input line for storing the characters of said error word arranged in the sequence of receipt from said OCR, with a first character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the characters and error propensity indicium of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond to said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity indicium corresponding to the character stored at said reference word origin in said reference word; accessing means connected to said storage means for accessing from said storage means when said decoded indicium indicates a character pair crowding propensity, a first one of said first type conditional probability than given the character located at said reference word origin in said refernce word was scanned, that the OCR substituted the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a character pair crowding propensity, a second one of said first type conditional probability that given the character next to the character at said reference word origin and said reference word was scanned, that the OCR substituted the character next to the character located at said error word origin in said error word; multiplying means connected to said storage means for multiplying said first one and said second one of said first type conditional probabilities as a first product; said accessing means accessing from said storage means when said decoded indicium indicates a character pair crowding propensity, a first one of said second type conditional probability that the character located at said reference word origin and the character located next to the character located at said reference word origin in said reference word was scanned, that the OCR executed a crowding error and output the character located at said error word origin and the character next to the character located at said error word origin in said error word; comparison means connected to said multiplying means for comparing the relative magnitudes of said first product and said second type conditional probability accessed from said storage means; a running product calculating means connected to said storage means, for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second type conditional probability accessed or said first one of said second type conditional probability if said second type conditional probability is greater than said first product; a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one character position when said first probability product is greater than said second type conditional probability; a shifting means shifting both said error word origin and said reference word origin by two character positions when said second type conditional probability is greater than said first probability product; whereby the reference word stored in said storage means having the highest conditional probability of having been misread as the error word stored in said first register, can be determined. - View Dependent Claims (8, 9)
-
-
10. An information processing system for selecting the correct form of a garbled input word misread by an optical character reader so as to incorrectly segment the characters in the word, the correct form of the input word being a member of a predetermined class of words, each comprising a plurality of characters, comprising:
-
a first shift register having an input connected to the output of said optical character reader, for storing the characters of said input OCR word in an arrangement ordered in the sequence in which the characters are received, said first shift register having three adjacent storage cells K1, K2, and K3, with the end character of the input word initially stored in cell K1 ; a first bulk storage means for storing said predetermined class of words as a dictionary, selected characters composing selected ones of said dictionary words having stored in association therewith an error propensity indicium for indicating the propensity of the character to being misread by said OCR through character splitting error mode; a second shift register having an input connected to the output of said first storage means, for storing characters of a dictionary word input from said first storage means, in an arrangement ordered in the sequence in which the characters are received, said second shift register having three adjacent storage cells L1, L2, and L3, with the end character of the dictionary word initially stored in cell L1 ; said first storage means sequentially loading words from said predetermined class of dictionary words, into said second register; a switching means having a first input connected to the output of cells K1, K2, and K3 of said first shift register, for selectively switching OCR word characters stored therein to the output of said switching means, a second input connected to the output of cells L1, L2, and L3 of said second shift register, for selectively switching the dictionary word characters stored therein to the output of said switching means, and a third control input connected to said second shift register for controlling said selective switching of said OCR word characters and said dictionary word characters by means of the error indicium associated with the dictionary word characters stored in cell L1 ; a second bulk storage means with an input connected to the output of said switching means, for storing a first type conditional probability P(Kn |Lm) that the OCR word character stored in cell Kn of said first shift register was misread by character substitution given that the dictionary word character stored in cell Lm of said second shift register was actually scanned, for n=1, m=1, n=2, m=2, and for n=3, m=2 and a second type conditional probability P(K1 K2 |L1) that the OCR word characters stored in cells K1 and K2 of said first shift register were misread by character splitting, given that the dictionary word character stored in cell L1 of said second shift register was actually scanned, said probabilities being accessed by said component OCR word characters and dictionary word characters which are selectively switched to the output of said switching means under the control of said error indicium associated with the dictionary word character stored in cell L1 of said second shift register; a first multiplier means having an input connected to the output of said second storage means, for multiplying a first received conditional probability by a second received conditional probability accessed from said second storage means and outputting a first probability product, and for multiplying a third received conditional probability by a fourth received conditional probability accessed from said second storage means and outputting a second probability product; said switching means operating when the error indicium stored in the L1 cell of said second shift register indicates a propensity to character splitting for the characters stored in the L1 cell, to access the first type conditional probability P(K1 |L1) and P(K2 |L2) from said second storage means for transmission to said first multiplier means as said first received conditional probabilities, for calculation of said first probability product, and to access the second type conditional probability P(K1 |L1) and the first type conditional probability P(K3 |L2) from said second storage means for transmission to said first multiplier means as said third received and said fourth received conditional probabilities for the calculation of said second probability product; a first comparator having an input connected to said first multiplier means for comparing the magnitude of said first probability product with that for said second probability product; a shift control means having a first input connected to said first comparator and a second input connected to said L1 cell of said second shift register, a first output connected to a shift input on said first shift register and a second output connected to a shift input on said second shift register, for shifting the contents of said first and said second shift registers in accordance with the relative magnitudes of said first and said second probability product and the value of the error propensity indicium for the character stored in the L1 cell; a second multiplier means having an input connected to said second storage means and a second control input connected to said first comparator, for accepting said first received conditional probability if said first product is larger or said third received conditional probability if said second product is larger as determined by said first comparator, and multiplying by the running product of all of said conditional probabilities calculated for the dictionary word presently stored in said second shift register; said shift control means shifting the contents of both said first and said second shift registers by one cell when said first probability product is greater than said second probability product; said shift control means shifting the contents of said first shift register by two cells and the contents of said second shift register by one cell when said second probability product is greater than said first probability product and the error indicium associated with the character in the L1 cell of said second shift register indicates a character splitting propensity; a second comparator means having an input connected to said second multiplier means and an output connected to said first storage means, for selecting the dictionary word stored in said first storage means having the largest running product when matched with the OCR word stored in said first shift register; said first storage means outputting on an output line said dictionary word indicated by said second comparator means, as the most likely correct form for said garbled input OCR word.
-
-
11. An information processing system for selecting the correct form of a garbled input word misread by an optical character reader so as to incorrectly segment the characters in the word, the correct form of the input word being a member of a predetermined class of words, each comprising a plurality of characters, comprising:
-
a first shift register having an input connected to the output of said optical character reader, for storing the characters of said input OCR word in an arrangment ordered in the sequence in which the characters are received, said first shift register having three adjacent storage cells K1, K2 and K3, with the end character of the input word initially stored in cell K1 ; a first bulk storage means for storing said predetermined class of words as a dictionary, selected characters composing selected ones of said dictionary words having stored in association therewith an error propensity indicium for indicating the propensity of the character to being misread by said OCR through a character pair concatenation error mode; a second shift register having an input connected to the output of said first storage means, for storing characters of a dictionary word input from said first storage means, in an arrangement ordered in the sequence in which the characters are received, said second shift register having three adjacent storage cells L1, L2, and L3, with the end character of the dictionary word initially stored in cell L1 ; said first storage means sequentially loading words from said predetermined class of dictionary words, into said second register; a switching means having a first input connected to the output of cells K1, K2 and K3 of said first shift register, for selectively switching OCR word characters stored therein to the output of said switching means, a second input connected to the output of cells L1, L2, and L3 of said second shift register, for selectively switching the dictionary word characters stored therein to the output of said switching means, and a third control input connected to said second shift register for controlling said selective switching of said OCR word characters and said dictionary word characters by means of the error indicium associated with the dictionary word characters stored in cells L1 and L2 ; a second bulk storage means with an input connected to the output of said switching means, for storing a first type conditional probability P(Kn |Lm) that the OCR word character stored in cell Kn of said first shift register was misread by character substitution given that the dictionary word character stored in cell Lm of said second shift register was actually scanned, for n=1, m=1, n=2, m=2, and for n=2, m=3, and a second type of conditional probability P(K1 |L1 L2) that the OCR word character stored in cell K1 of said first shift register was misread by character concatenation, given that the dictionary word characters stored in cells L1 and L2 of said second shift register was actually scanned, said probabilities being accessed by said component OCR word characters and dictionary word characters which are selectively switched to the output of said switching means under the control of said error indicium associated with the dictionary word characters stored in cells L1 and L2 of said second shift register; a first multiplier means having an input connected to the output of said second storage means, for multiplying a first received conditional probability by a second received conditional probability accessed from said second storage means and outputting a first probability product, and for multiplying a third received conditional probability by a fourth received conditional probability accessed from said second storage means and outputting a second probability product; said switching means operating when the error indicium stored in the L1 cell of said second shift register indicates a propensity to character concatenation for the characters stored in the L1 and L2 cells, to access the first type conditional probabilities P(K1 |L1) and P(K2 |L2) from said second storage means for transmission to said first multiplier means as said first received and said second received conditional probabilities for calculating said first probability product, and to access the second type conditional probability P(K1 |L1 L2) and the first type conditional probability P(K2 |L3) from said second storage means for transmission to said first multiplier means as said third received and fourth received conditional probability for the calculation of said second probability product; a first comparator having an input connected to said first multiplier means for comparing the magnitude of said first probability product with that for said second probability product; a shift control means having a first input connected to said first comparator and a second input connected to said L1 cell of said second shift register, a first output connected to a shift input on said first shift register and a second output connected to a shift input on said second shift register, for shifting the contents of said first and said second shift registers in accordance with the relative magnitudes of said first and said second probability product and the value of the error propensity indicium for the character stored in the L1 cell; A second multiplier means having an input connected to said second storage means and a second control input connected to said first comparator, for accepting said first received conditional probability if said first product is larger or said third received conditional probability if said second product is larger as determined by said first comparator, and multiplying by the running product of all of said conditional probabilities calculated for the dictionary word presently stored in said second shift register; said shift control means shifting the contents of both said first and said second shift registers by one cell when said first probability product is greater than said second probability product; said shift control means shifting the contents of said second shift register by two cells and the contents of said first shift register by one cell when said second probability is greater than said first probability product and the error indicium associated with characters in the L1 and L2 cells of said second shift register indicates a character concatenation propensity; a second comparator means having an input connected to said second multiplier means and an output connected to said first storage means, for selecting the dictionary word stored in said first storage means having the largest running product when matched with the OCR word stored in said first shift register; said first storage means outputting on an output line said dictionary word indicated by said second comparator means, as the most likely correct form for said garbled input OCR word.
-
-
12. An information processing system for selecting the corect form of a garbled input word misread by an optical character reader so as to incorrectly segment the characters in the word, the correct from of the input word being a member of a predetermined class of words, each comprising a plurality of characters, comprising:
-
a first shift register having an input connected to the output of said optical character reader, for storing the characters of said input OCR word in an arrangement ordered in the sequence in which the characters are received, said first shift register having three adjacent storage cells K1, K2 and K3, with the end character of the input word initially stored in cell K1 ; a first bulk storage means for storing said predetermined class of words as a dictionary, selected characters composing selected ones of said dictionary words having stored in association therewith an error propensity indicium for indicating the propensity of the character to being misread by said OCR through a character pair crowding error mode; a second shift register having an input connected to the output of said first storage means, for storing characters of a dictionary word input from said first storge means, in an arrangement ordered in the sequence in which the characters are received, said second shift register having three adjacent storage cells L1, L2, and L3, with the end character of the dictionary word initially stored in cell L1 ; said first storage means initiating the sequential loading of words from said predetermined class of dictionary words, into said second register, upon receipt of a signal over said reset control input indicating the receipt of a new OCR word from said OCR output; a switching means having a first input connected to the output of cells K1, K2 and K3 of said first shift register, for selectively switching OCR word characters stored therein to the output of said switching means, a second input connected to the output of cells L1, L2 and L3 of said second shift register, for selectively switching the dictionary word characters stored therein to the output of said switching means, and a third control input connected to said second shift register for controlling said selective switching of said OCR word characters and said dictionary word characters by means of the error indicium associated with the dictionary word characters stored in cell L1 ; a second bulk storage means with an input connected to the output of said switching means, for storing a first type conditional probability P(Kn |Lm) that the OCR word character stored in cell Kn of said first shift register was misread by character substitution given that the dictionary word character stored in cell Lm of said second shift register was actually scanned, for n=1, m=1 and n=2, m=2, and a second type conditional probability P(K1 K2 |L1 L2) that the OCR word characters stored in cells K1 and K2 of said first shift register were misread by character crowding, given that the dictionary word characters stored in cell L1 and L2 of said second shift register were actually scanned, said probabilities being accessed by said component OCR word characters and dictionary word characters which are selectively switched to the output of said switching means under the control of said error indicium associated with the dictionary word characters stored in the cells L1 and L2 of said second shift register; a first multiplier means having an input connected to the output of said second storage means, for multiplying a first received conditional probability by a second received conditional probability accessed from said second storage means and outputting a probability product; said switching means operating when the error indicium associated with characters stored in the L1 and L2 cells of said second shift register indicates a propensity to character crowding for the characters stored in the L1 and L2 cells, to access the first type conditional probability P(K1 |L1) and P(K2 |L2) from said second storage means for transmission to said first multiplier means as said first received and second received conditional probabilities, for calculation of said probability product, and to access the second type conditional probability P(K1 K2 |L1 L2) from said second storage means; a first comparator having an input connected to said first multiplier means and said second storage means for comparing the magnitude of said probability product with that for said accessed second type conditional probability; a shift control means having a first input connected to said first comparator and a second input connected to said L1 cell of said second shift register, a first output connected to a shift input on said first shift register and a second output connected to a shift input on said second shift register, for shifting the contents of said first and said second shift registers in accordance with the relative magnitudes of said first and said second probability product and the value of the error propensity indicium for the character stored in the L1 cell; a second multiplier means having an input connected to said second storage means and a second control input connected to said first comparator, for accepting said first received conditional probability if said first product is larger or said second type conditional probability if said second product is larger, as determined by said first comparator, and multiplying by the running product of all of said conditional probabilities calculated for the dictionary word presently stored in said second shift register; said shift control means shifting the contents of both said first and second shift registers by one cell when said first probability product is greater than said second probability product; said shift control means shifting the contents of said first shift register by two cells and the contents of said second shift register by two cells when said second probability product is greater than said first probability product and the error indicium stored in the L1 cell of said second shift register indicates a character crowding propensity; a second comparator means having an input connected to said second multiplier means and an output connected to said first storage means, for selecting the dictionary word stored in said first storage means having the largest running product when matched with the OCR word stored in said first shift register; said first storage means outputting on an output line said dictionary word indicated by said second comparator means, as the most likely correct form for said garbled input OCR word.
-
-
13. An information processing system for selecting the correct form of a garbled input word misread by an optical character reader so as to incorrectly segment the characters in the word, the correct form of the input word being a member of a predetermined class of words, each comprising a plurality of characters, comprising:
-
a first shift register having an input connected to the output of said optical character reader, for storing the characters of said input OCR word in an arrangement ordered in the sequence in which the characters are received, said first shift register having three adjacent storge cells K1, K2 and K3, with the end character of the input word initially stored in the cell K1 ; a first bulk storage means having a reset control input connected to said output of the OCR, for storing said predetermined class of words as a dictionary, selected characters composing selected ones of said dictionary words having stored in association therewith an error propensity indicium for indicating the propensity of the character to being misread by said OCR through an error mode which changes the number of characters in the misread word; a second shift register having an input connected to the output of said first storage means, for storing characters of a dictionary word input from said first storge means, in an arrangement ordered in the sequence in which the characters are received, said second shift register having three adjacent storage cells L1, L2 and L3, with the end character of the dictionary word initially stored in cell L1 ; said first storage means initiating the sequential loading of words from said predetermined class of dictionary words, into said second register, upon receipt of a signal over said reset control input indicating the receipt of a new OCR word from said OCR output; a switching means having a first input connected to the output of cells K1, K2 and K3 of said first shift register, for selectively switching OCR word characters stored therein to the output of said switching means, a second input connected to the output of cells L1, L2 and L3 of said second shift register, for selectively switching the dictionary word characters stored therein to the output of said switching means, and a third control input connected to said second shift register for controlling said selective switching of said OCR word characters and said dictionary word characters by means of the error indicium associated with the dictionary word characters stored in cell L1 ; a second bulk storage means with an input connected to the output of said switching means, for storing a first type conditional probability P(Kn |Lm) that the OCR word character stored in cell Kn of said first shift register was misread by character substitution given that the dictionary word character stored in cell Lm of said second shift register was actually scanned, for n=1, m=1, n=2, m=2 and for n=2, m=3, and for n=3, m=2, a second type conditional probability P(K1 K2 |L1) that the OCR word characters stored in cells K1 and K2 of said first shift register were misread by character splitting, given that the dictionary word character stored in cell L1 of said second shift register was actually scanned, and a third type of conditional probability p(K1 |L1 L2) that the OCR word character stored in cell K1 of said first shift rgister was misread by character concatenation, given that the dictionary word character stored in cells L1, and L2 of said second shift register were actually scanned, said probabilities being accessed by said component OCR word characters and dictionary word characters which are selectively switched to the output of said switching means under the control of said error indicium associated with the dictionary word character stored in cell L1 of said second shift register; a first multiplier means having an input connected to the output of said second storage means, for multiplying a first received conditional probability by a second received conditional probability accesssed from said second storage means and outputting a first probability product, and for multiplying a third received conditional probability by a fourth received conditional probability accessed from said second storage means and outputting a second probability product; said switching means operating when the error indicium stored in the L1 cell of said second shift register indicates a propensity to character splitting for the characters stored in the L1 cell, to access the first type conditional probability P(K1 |L1) and P(K2 |L2) from said second storage means for transmission to said first multiplier means as said first received and second received conditional probabilities, for calculation of said first probability product, and to access the second type conditional probability P(K1 K2 |L1) and the first type conditional probability P(K3 |L2) from said second storage means for transmission to said first multiplier means as said third received and said fourth received conditional probabilities for the calculation of said second probability product; said switching means operating when the error indicium stored in the L1 cell of said second shift register indicates a propensity to character concatenation for the characters stored in the L1 and L2 cells, to access the first type conditional probabilities P(K1 |L1) and P(K2 |L2) from said second storage means for transmission to said first multiplier means as said first received and said second received conditional probabilities for calculating said first probability product, and to access the third type conditional probability P(K1 |L1 L2) and the first type conditional probability P(K2 |L3) from said second storage means for transmission to said first multiplier means as said third received and fourth received conditional probability for the calculation of said second probability product, a first comparator having an input connected to said first multiplier means for compring the magnitude of said first probability product with that for said second probability product; a shift control means having a first input connected to said first comparator and a second input connected to said L1 cell of said second shift register, a first output connected to a shift input on said first shift register and a second output connected to a shift input on said second shift register, for shifting the contents of said first and said second shift registers in accordance with the relative magnitudes of said first and said second probability product and the value of the error propensity indicium for the character stored in the L1 cell; a second multiplier means having an input connected to said second storage means and a second control input connected to said first comparator, for accepting said first received conditional probability if said first product is larger or said third received conditional probability if said second product is larger as determined by said first comparator, and multiplying by the running product of all of said conditional probabilities calculated for the dictionary word presently stored in said second shift register; said shift control means shifting the contents of both said first and said second shift registers by one cell when said first probability product is greater than said second probability product; said shift control means shifting the contents of said first shift register by two cells and the contents of said second shift register by one cell when said second probability product is greater than said first probability product and the error indicium stored in the L1 cell of said second shift register indicates a character splitting propensity; said shift control means shifting the contents of said second shift register by two cells and the contents of said first shift register by one cell when said second probability product is greter than said first probability product and the error indicium stored in the L1 cell of said second shift register indicates a character concatenation propensity; a second comparator means having an input connected to said second multiplier means and an ouput connected to said first storage means, for selecting the dictionary word stored in said first storage means having the largest running product when matched with the OCR word stored in said first shift register; said first storage means outputting on an output line said dictionary word indicated by said second comparator means, as the most likely correct form for said garbled input OCR word.
-
-
14. An information processing system for selecting the correct form of a garbled input word misread by an optical character reader so as to incorrectly segment the characters in the word, the correct form of the input word being a member of a predetermined class of words, each comprising a plurality of characters, comprising:
-
a first shift register having an input connected to the output of said optical character reader, for storing the characters of said input OCR word in an arrangement ordered in the sequence in which the characters are received, said first shift register having three adjacent storage cells K1, K2 and K3, with the end character of the input word initially stored in cell K1 ; a first bulk storage means for storing said predetermined class of words as a dictionary; a second shift register having an input connected to the output of said first storage means, for storing characters of a dictionary word input from said first storage means, in an arrangement ordered in the sequence in which the characters are received, said second shift register having three adjacent storage cells L1, L2 and L3, with the end character of the dictionary word initially stored in cell L1 ; a second bulk storage means with inputs connected to the L1 and L2 cells for storing OCR error propensity indicia for selected characters composing said dictionary words for indicating the propensity of said selected characters to being misread by said OCR through an error mode which changes the number of characters in the misread word; a switching means having a first input connected to the output of cells K1, K2 and K3 of said first shift register, for selectively switching OCR word characters stored therein to the output of said switching means, a second input connected to the output of cells L1, L2 and L3 of said second shift register, for selectively switching the dictionary word characters stored therein to the output of said switching means, and a third control input connected to the output of said second storage means for controlling said selective switching of said OCR word characters and said dictionary word characters by means of the error indicium accessed from said second storage means for the dictionary word characters stored in cells L1 and L2 ; a third bulk storage means with an input connected to the output of said switching means, for storing a first type conditional probability P(Kn |Lm) that the OCR word character stored in cell Kn of said first shift register was misread by character substitution given that the dictionary word character stored in cell Lm of said second shift register was actually scanned, for n=1, m=1, n=2, m=2 and for n=2, m=3 and for n=3, m=2, a second type conditional probability P(K1 K2 |L1) that the OCR word characters stored in cells K1 and K2 of said first shift register was misread by character splitting, given that the dictionary word character stored in cell L1 of said second shift register was actually scanned, and a third type of conditional probability P(K1 |L1 L2) that the OCR word character stored in cell K1 of said first shift register was misread by character concatenation, given that the dictionary word character stored in cells L1 and L2 of said second shift register were actually scanned, said probabilities being accessed by said component OCR word characters and dictionary word characters which are selectively switched to the output of said switching means under the control of said error indicium accessed from said second storage for the dictionary word character stored in cell L1 of said second shift register; a first multiplier means having an input connected to the output of said third storage means, for multiplying a first received conditional probability by a second received conditional probability accessed from said second storage means and outputting a first probability product, and for multiplying a third received conditional probability by a fourth received conditional probability accessed from said second storage means and outputting a second probability product; said switching means operating when the error indicium for the character stored in the L1 cell of said second shift register indicates a propensity to character splitting for the characters stored in the L1 cell, to access the first type conditional probability P(K1 |L1) and P(K2 |L2) from said third storage means for transmission to said first multiplier means as said first received and second received conditional probabilities, for calculation of said first probability product, and to access the second type conditional probability P(K1 K2 |L1) and the first type conditional probability P(K3 |L2) from said third storage means for transmission to said first multiplier means as said third received and said fourth received conditional probabilities for the calculation of said second probability product; said switching means operating when the error indicium for the characters stored in the L1 and L2 cells of said second shift register indicates a propensity to character concatenation for the characters stored in the L1 and L2 cells, to access the first type conditional probabilities P(K1 |L1) and P(K2 |L2) from said third storage means for transmission to said first multiplier means as said first received and said second received conditional probabilities for calculating said first probability product, and to access the third type conditional probability P(K1 |L1 L2) and the first type conditional probability P(K2 |L3) from said third storage means for transmissiosn to said first multiplier means as said third storage received and fourth received conditional probability for the calculation of said second probability product; a first comparator having an input connected to said first multiplier means for comparing the magnitude of said first probability product with that for said second probability product; a shift control means having a first input connected to said first comparator and a second input connected to said output of said second storage means, a first output connected to a shift input on said first shift register and a second output connected to a shift input on said second shift register, for shifting the contents of said first and said second shift registers in accordance with the relative magnitudes of said first and said second probability product and the value of the error propensity indicium for the character stored in the L1 cell; a second multiplier means having an input connected to said third storage means and a second control input connected to said first comparator, for accepting said first received conditional probability when said first product is larger or said third received conditional probability when said second product is larger, as determined by said first comparator and mulitplying by the running product of all of said probability products calculated for the dictionary word presently stored in said second shift register; said shift control means shifting the contents of both said first and said second shift registers by one cell when said first probability product is greater than said second probability product; said shift control means shifting the contents of said first shift register by two cells and the contents of said second shift register by one cell when said second probability product is greater than said first probability product and the error indicium for the character stored in the L1 cell of said second shift register indicates a character splitting propensity; said shift control means shifting the contents of said second shift register by two cells and the contents of said first shift register by one cell when said second probability product is greater than said first probability product and the error indicium for the character stored in the L1 and L2 cells of said second shift register indicates a character concatenation propensity; a second comparator means having an input connected to said second multiplier means and an output connected to said first storage means, for selecting the dictionary word stored in said first storage means having the largest running product when matched with the OCR word stored in said first shift register; said first storage means outputting on an output line said dictionary word indicated by said second comparator means, as the most likely correct form for said garbled input OCR word.
-
-
15. In a system for recognizing speech, a data processing system for selecting the correct form of an input error word garbled by a speech analyzer splitting error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of phoneme-characters, comprising:
-
a storage means for storing said predetermined class of reference words, selected phoneme-characters composing the reference words having stored in said storage means an error propensity indicium for indicating the propensity of the phoneme-character to being misread through a splitting error; said storage means storing a first type conditional probability that a first phoneme-character can be output by said speech analyzer through phoneme-character substitution, given that a second phoneme-character was actually spoken and a second type conditional probability that a pair of adjacent phoneme-characters can be output by said speech analyzer through phoneme-character splitting, given that a third phoneme-character was actually spoken; a first register means connected to an input line from said speech analyzer for storing the phoneme-characters of said error word arranged in the sequence of receipt from said speech analyzer, with a first phoneme-character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the phoneme-characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of phoneme-characters in said first register means, with a first phoneme-character in said reference word corresponding to said first phoneme-character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity indicium corresponding to the phoneme-character located at said reference word origin in said reference word; - View Dependent Claims (17)
-
-
16. accessing means connected to said storage jmeans for accessing
said storage means, when said decoded indicium indicates a phoneme-character splitting propensity, a first one of said first type conditional probability that given the phoneme-character located at said reference word origin in said reference word was spoken, that the speech analyzer substituted the phoneme-character located at said error word origin in said error word; -
said accessing means accessing from said storage means when said decoding indicium indicates an error splitting propensity, a second one of said first type conditional probability that given the phoneme-character next to the phoneme-character at said reference word origin in said reference word was spoken, that the speech analyzer substituted the phoneme-character next to the phoneme-character located at said error word origin in said error word; multiplying means connected to said storage means for multiplying said first one and said second one of said first conditional probabilities, as a first product; said accessing means accessing from said storage means when said decoding indicium indicates a phoneme-character splitting propensity, a first one of said second type conditional probability that given the phoneme-character located at said reference word origin in said reference word was spoken, that the speech analyzer split it into the phoneme-character located at said error word origin and the phoneme-character next to the phoneme-character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoding indicium indicates a phoneme-character splitting propensity, a third one of said first type conditional probabilities that given the phoneme-character next to the phoneme-character located at said reference word origin in said reference word was spoken, that the speech analyzer substituted the second next phoneme-character to the phoneme-character located at said error word origin in said error word; said multiplying means multiplying said first one of said second type conditional probability and said third one of said first type conditional probability as a second product; comparison means connected to said multiplying means for comparing the relative magnitudes of said first and said second product; a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabilities if said second product is greater than said first product; a shift means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one phoneme-character position when said first probability product is greater than said second probability product; said shifting means shifting said error word origin by two phoneme-character positions and shifting said reference word origin by one phoneme-character position when said second probability product is greater than said first probability product; whereby the reference word stored in said storage means having the highest conditional probability of having been misread as the error word stored in said first register, can be determined.
-
-
19. In a system for recognizing speech, a data processing system for selecting the correct form of an input error word garbled by a speech analyzer concatenation error, the correct form of the error word being a member of the predetermined class of reference words, each comprising a plurality of phoneme-characters, comprising:
-
a storage means for storing said predetermined class of reference words in a storage means, selected phoneme-characters composing the words in said class having stored in said storage means an error propensity indicium for indicating the propensity of the phonema-character to be misread through a concatenation error; said storage means storing a first type conditional probability that a first phoneme-character can be output by said speech analyzer through phoneme-character substitution, given that a second phoneme-character was actually spoken, and a second type conditional probability that a first phoneme-character can be output by said speech analyzer through phoneme-character concatenation, given that a pair of adjacent phoneme-characters were actually spoken; a first register means connected to an input line from said speech analyzer for storing the phoneme-characters of said error word in a first register, arranged in a sequence of receipt from said speech analyzer, with a first phoneme-character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the phoneme-characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of phoneme-characters in said first register means, with a first phoneme-character in said reference word corresponding to said first phoneme-character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity indicium corresponding to the phoneme-character located at said reference word origin in said first reference word; accessing means connected to said storage means for accessing from said storage means when said decoded indicium indicates a phoneme-character concatenation propensity, a first one of said first type conditional probability that given the phoneme-character located at said first reference word origin of said reference word was spoken, that the speech analyzer substituted the phoneme-character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a phoneme-character concatenation propensity, a second one of said first type conditional probability that given the phoneme-character next to the phoneme-character located at said reference word origin in said reference word was spoken, that the speech analyzer substituted the phoneme-character next to the phoneme-character located at said error word origin in said error word; multiplying means connected to said storage means for multiplying said first one and said second one of said first type conditional probabilities as a first product; said accessing means accessing from said storage means when said decoded indicium indicates a phoneme-character concatenation propensity, a first one of said second type conditional probability that given the phoneme-character located at said reference word origin and the phoneme-character next to the phoneme-character located at said reference word origin in said reference word were spoken, that the speech analyzer concatenated them into the phoneme-character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a phoneme-character concatenation propensity, a third one of said first type conditional probabilities that given the phoneme-character second next to the phoneme-character located at said reference word as spoken, that the speech analyzer substituted the phoneme-character next to the phoneme-character located at said error word origin in said error word; said multiplying means multiplying said first one of said second type conditional probability and said third one of said first type conditional probability as a second product; comparison means connected to said multiplying means for comparing the relative magnitude of said first and said second product; a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabilities if said second product is greater than said first product; a shifting means connected to said comparison means for shifting said error word origin and said reference word origin by one phoneme-character position when said first probability product is greater than said second probability product; said shifting means shifting the error word origin by one phoneme-character position and the reference word origin by two phoneme-character positions when said second probability product is greater than said first probability product; whereby the reference word having the greatest total conditional probability that the error word was output by the speech analyzer given that the reference word was spoken, can be determined.
-
-
22. In a system for recognizing speech, a data processing system for selecting the correct form of an input error word garbled by a speech analyzer crowding error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of phoneme-characters, comprising:
-
a storage means for storing said predetermined class of reference words in a storage means, selected phoneme-characters composing the reference word having stored in said storage means an error propensity indicium for indicating the propensity of the phoneme-character to being misread through a crowded error; said storage means storing a first type conditional probability that a first phoneme-character can be output by said speech analyzer through phoneme-character substitution, given that a second phoneme-character was actually spoken, and a second type conditional probability that a first pair of adjacent phoneme-characters can be output by said speech analyser through phoneme-character crowding, given that a second pair of adjacent phoneme-characters was actually spoken; a first register means connected to an input line from said speech analyzeer for storing the phoneme-characters of said error word arranged in the sequence of receipt from said speech analyzer, with a first phoneme-character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the phoneme-characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond to said sequence of phoneme-characters in said first register means, with a first phoneme-character in said reference word corresponding to said first phoneme-character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity indicium corresponding to the phoneme-character stored at said reference word origin in said reference word; accessing means connected to said storage means for accessing from said storage means when said decoded indicium indicates a phoneme-character pair crowding propensity, a first one of said first type conditional probability than given the phoneme-character located at said reference word origin in said reference word was spoken, that the speech analyzer substituted the phoneme-character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a phoneme-character pair crowding propensity, a second one of said first type conditional probability that given the phoneme-character next to the phoneme-character at said reference word origin and said reference word was spoken, that the speech analyzer substituted the phoneme-character next to the phoneme-character located at said error word origin in said error word; multiplying means connected to said storage means for multiplying said first one and said second one of said first type conditional probabilities as a first product; said accessing means accessing from said storage means when said decoded indicium indicates a phoneme-character pair crowding propensity, a first one of said second type conditional probability that the phoneme-character located at said reference word origin and the phoneme-character located next to the phoneme-character located at said reference word origin in said reference word was spoken, that the speech analyzer executed a crowding error and output the phoneme-character located at said error word origin and the phoneme-character next to the phoneme-character located at said error word origin in said error word; comparison means connected to said multiplying means for comparing the relative magnitudes of said first product and said second type conditional probability accessed from said storage means; a running product calculating means connected to said storage means, for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second type conditional probability accessed or said first one of said second type conditional probability if said second type conditional probability is greater than said first product; a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one phoneme-character position when said first probability product is greater than said second type conditional probability; a shifting means shifting both said error word origin and said reference word origin by two phoneme-character positions when said second type conditional probability is greater than said first probability product; whereby the reference word stored in said storage means having the highest conditional probability of having been misread as the error word stored in said first register, can be determined.
-
-
25. A data processing system for selecting the correct form of an input error word mistyped on a keyboard by an operator as a character addition error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of characters, comprising:
-
a storage means for storing said predetermined class of reference words, selected characters composing the reference words having stored in said storage means an error propensity indicium for indicating the propensity of the character to being mistyped by an operator through a character addition error, said storage means storing a first type conditional probability that a first character can be output by operator mis-stroke on a keyboard through character substitution, given that a second character was to be typed, and a second type conditional probability that a pair of adjacent characters can be output by the operator on a keyboard through character additions, given that a third character was intended to be typed; a first register means connected to an input line for storing the characters of said error word arranged in the sequence of receipt from said keyboard, with a first character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity indicium corresponding to the character located at said reference word origin in said reference word; accessing means connected to said storage means for accessing from said storage means, when said decoded indicium indicates a character addition propensity, a first one of said first type conditional probability that given the character located at said reference word origin in said reference word was to be typed, that the keyboard operator substituted the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a character addition propensity, a second one of said first type conditional probability that given the character next to the character at said reference word origin in said reference word was to be typed, that the keyboard operator substituted the character next to the character located at said error word origin in said error word; multiplying means connected to said storage means for multiplying said first one and said second one of said first conditional probabilities, as a first product; said accessing means accessing from said storage means when said decoded indicium indicates a character addition propensity, a first one of said second type conditional probability that given the character located at said reference word origin in said reference word was to be typed, that the keyboard operator augmented it into the character located at said error word origin and the character next to the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a character addition propensity, a third one of said first type conditional probabilities that given the character next to the character located at said reference word origin in said reference word was to be typed, that the keyboard operator substituted the second next character to the character located at said error word origin in said error word; said multiplying means multiplying said first one of said second type conditional probability and said third one of said first type conditional probability as a second product; comparison means connected to said multiplying means for comparing the relative magnitudes of said first and said second product; a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabilities if said second product is greater than said first product; a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one character position when said first probability product is greater than said second probability product; said shifting means shifting said error word origin by two character positions and shifting said reference word origin by one character position when said second probability product is greater than said first probability product; whereby the reference word stored in said storage means having the highest conditional probability of having been mistyped as the error word stored in said first register, can be determined.
-
-
28. A data procesing system for selecting the correct form of an input error word operator mistyped on a keyboard as a character omission error, the correct form of the error word being a member of the predetermined class of reference words, each comprising a plurality of characters, comprising:
-
a storage means for storing said predetermined class of reference words in a storage means, selected characters composing the words in said class having stored in said storage means an error propensity indicium for indicating the propensity of the character to be operator mistyped through a character omission error; said storage means storing a first type conditional probability that a first character can be output by said keyboard through character substitution, given that a second character was to be typed, and a second type conditional probability that a first character can be output by said keyboard operator through character omission, given that a pair of adjacent characters were to be typed; a first register means connected to an input line for storing the characters of said error word in a first register, arranged in a sequence of receipt from said keyboard, with a first character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity corresponding to the character located at said reference word origin in said first reference word; accessing means connected to said storage means for accessing from said storage means when said decoded indicium indicates a character omission propensity, a first one of said first type conditional probability that given the character located at said first reference word origin of said reference word was to be typed, that the keyboard operator substituted the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a character omission propensity, a second one of said first type conditional probability that given the character next to the character located at said reference word origin in said reference word was to be typed, that the keyboard operator substituted the character next to the character located at said error word origin in said error word; multiplying means connected to said storage means for multiplying said first one and said second one of said first type conditional probabilities as a first product; said accessing means accessing from said storage means when said decoded indicium indicates a character omission propensity, a first one of said second type conditional probability that given the character located at said reference word origin and the character next to the character located at said reference word origin in said reference word were to be typed, that the keyboard operator truncated them into the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a character omission propensity, a third one of said first type conditional probabilities that given the character second next to the character located at said reference word origin and said reference word was to be typed, that the keyboard operator substituted the character next to the character located at said error word origin in said error word; said multiplying means multiplying said first one of said second type conditional probability and said third one of said first type conditional probability as a second product; comparison means connected to said multiplying means for comparing the relative magnitude of said first and said second product; a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabilities if said second product is greater than said first product; a shifting means connected to said comparison means for shifting said error word origin and said reference word origin by one character position when said first probability product is greater than said second probability product; said shifting means shifting the error word origin by one character position and the reference word origin by two character positions when said second probability product is greater than said first probability product; whereby the reference word having the greatest total conditional probability that the error word was output by the keyboard operator given that the reference word was to be typed, can be determined.
-
-
31. A data processing system for selecting the correct form of an input error word mistyped on a keyboard as a character transposition error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of characters, comprising:
-
a storage means for storing said predetermined class of reference words in a storage means, selected characters composing the reference word having stored in said storage means an error propensity indicium for indicating the propensity of the character to being mistyped through a character transposition error; said storage means storing a first type conditional probability that a first character can be output by said keyboard through character substitution, given that a second character was to be typed, and a second type conditional probability that a first pair of adjacent characters can be output by said keyboard through character transposition, given that a second pair of adjacent characters was to be typed; a first register means connected to an input line for storing the characters of said error word arranged in the sequence of receipt from said keyboard, with a first character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the characters and error propensity indicium of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond to said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity indicium corresponding to the character stored at said reference word origin in said reference word; accessing means connected to said storage means for accessing from said storage means when said decoded indicium indicates a character transposition propensity, a first one of said first type conditional probability that given the character located at said reference word origin in said reference word was to be typed, that the keyboard substituted the character located at said error word origin in said error word; said accessing means accessing from said storage means when said decoded indicium indicates a character transposition propensity, a second one of said first type conditional probability that given the character next to the character at said reference word origin and said reference word was to be typed, that the keyboard operator substituted the character next to the character located at said error word origin in said error word; multiplying means connected to said storage means for multiplying said first one and said second one of said first type conditional probabilities as a first product; said accessing means accessing from said storage means when said decoded indicium indicates a character transposition propensity, a first one of said second type conditional probability that the character located at said reference word origin and the character located next to the character located at said reference word origin in said reference word was to be typed, that the keyboard operator transposed the characters and output the character located at said error word origin and the character next to the character located at said error word origin in said error word; comparison means connected to said multiplying means for comparing the relative magnitudes of said first product and said second type conditional probability accessed from said storage means; a running product calculating means connected to said storage means, for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second type conditional probability accessed or said first one of said second type conditional probability if said second type conditional probability is greater than said first product; a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one character position when said first probability product is greater than said second type conditional probability; a shifting means shifting both said error word origin and said reference word origin by two character positions when said second type conditional probability is greater than said first probability product; whereby the reference word stored in said storage means having the highest conditional probability of having been mistyped as the error word stored in said first register, can be determined.
-
-
34. A data processing system for selecting the correct form of an input error word mistyped on a keyboard by an operator as a character substitution error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of characters, comprising:
-
a storage means for storing said predetermined class of reference words, selected characters composing the reference words having stored in said storage means an error propensity indicium for indicating the propensity of the character to being misread through a character substitution error, said storage means storing a first type conditional probability that a first character can be output by the operator misstroking said keyboard through character substitution, given that a second character was to be typed; a first register means connected to an input line for storing the characters of said error word arranged in the sequence of receipt from said keyboard with a first character at a given end of said error word defining a first position for an error word origin; a second register means connected to said storage means for storing the characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of characters in said first register means, with a first character in said reference word corresponding to said fist character in said error word, defining a first position for a reference word origin; decoding means connected to said second register for decoding the error propensity indicium corresponding to the character located at said reference word origin in said reference word; accessing means connected to said storage means for accessing from said storage means, when said decoded indicium indicates an operator miskeying character substitution propensity, a first one of said first type conditional probability that given the character located at said reference word origin in said reference word was to be typed, that the keyboard operator substituted by miskeying the character located at said error word origin in said error word; a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities; a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one character position; whereby the reference word stored in said storage means having the highest conditional probability of having been mistyped by the operator as the error word stored in said first register, can be determined.
-
-
37. An information processing system for selecting the correct form of an input word mistyped by an operator on a keyboard as a character addition error, the correct form of the input word being a member of a predetermined class or words, each comprising a plurality of characters, comprising:
-
a first shift register having an input connected to the output of said keyboard, for storing the characters of said input word in an arrangement ordered in the sequence in which the characters are received, said first shift register having three adjacent storage cells K1, K2, and K3, with the end character of the input word initially stored in cell K1 ; a first bulk storage means for storing said predetermined class of words as a dictionary, selected characters composing selected ones of said dictionary words having stored in association therewith an error propensity indicium for indicating the propensity of the character to being mistyped by an operator on said keyboard through character addition error mode; a second shift register having an input connected to the output of said first storage means, for storing characters of a dictionary word input from said first storage means, in an arrangement ordered in the sequence in which the characters are received, said second shift register having three adjacent storage cells L1, L2, and L3, with the end character of the dictionary word initially stored in cell L1 ; said first storage means sequentially loading words from said predetermined class of dictionary words, into said second register; a switching means having a first input connected to the output of cells K1, K2, and K3 of said first shift register, for selectively switching input word characters stored therein to the output of said switching means, a second input connected to the output of cells L1, L2, and L3 of said second shift register, for selectively switching the dictionary word characters stored therein to the output of said switching means, and a third control input connected to said second shift register for controlling said selective switching of said input word characters and said dictionary word characters by means of the error indicium associated with the dictionary word characters stored in cell L1 ; a second bulk storage means with an input connected to the output of said switching means, for storing a first type conditional probability P(Kn |Lm) that the input word character stored in cell Kn of said first shift register was operator mistyped by character substitution given that the dictionary word character stored in cell Lm of said second shift register was to be typed, for n=1, m=1, n=2, m=2, and for n=3, m=2 and a second type conditional probability P(K1 K2 |L1) that the input word characters stored in cells K1 and K2 of said first shift register were operator mistyped by a character addition error, given that the dictionary word character stored in cell L1 of said second shift register was to be typed, said probabilities being accessed by said component input word characters and dictionary word characters which are selectively switched to the output of said switching means under the control of said error indicium associated with the dictionary word character stored in cell L1 of said second shift register; a first multiplier means having an input connected to the output of said second storage means, for multiplying a first received conditional probability by a second received conditional probability accessed from said second storage means and outputting a first probability product, and for multiplying a third received conditional probability by a fourth received conditional probability accessed from said second storage means and outputting a second probability product; said switching means operating when the error indicium stored in the L1 cell of said second shift register indicates a propensity to a character addition error for the characters stored in the L1 cell, to access the first type conditional probability P(K1 |L1) and P(K2 |L2) from said second storage means for transmission to said first multiplier means as said first received conditional probabilities, for calculation of said first probability product, and to access the second type conditional probability P(K1 |L1) and the first type conditional probability P(K3 |L2) from said second storage means for transmission to said first multiplier means as said third received and said fourth received conditional probabilities for the calculation of said second probability product; a first comparator having an input connected to said first multiplier means for comparing the magnitude of said first probability product with that for said second probability product; a shift control means having a first input connected to said first comparator and a second input connected to said L1 cell of said second shift register, a first output connected to a shift input on said first shift register and a second output connected to a shift input on said second shift register, for shifting the contents of said first and said second shift registers in accordance with the relative magnitudes of said first and said second probability product and the value of the error propensity indicium for the character stored in the L1 cell; a second multiplier means having an input connected to said second storage means and a second control input connected to said first comparator, for accepting said first received conditional probability if said first product is larger or said third received conditional probability if said second product is larger as determined by said first comparator, and multiplying by the running product of all of said conditional probabilities calculated for the dictionary word presently stored in said second shift register; said shift control means shifting the contents of both said first and said second shift registers by one cell when said first probability product is greater than said second probability product; said shift control means shifting the contents of said first shift register by two cells and the contents of said second shift register by one cell when said second probability product is greater than said first probability product and the error indicium associated with the character in the L1 cell of said second shift register indicates a character operator addition error propensity; a second comparator means having an input connected to said second multiplier means and an output connected to said first storage means, for selecting the dictionary word stored in said first storage means having the largest running product when matched with the input word stored in said first shift register; said first storage means outputting on an output line said dictionary word indicated by said second comparator means, as the most likely correct form for said operator mistyped input word.
-
-
38. An information processing system for selecting the correct form of an input word operator mistyped on a keyboard as a character omission error, the correct form of the input word being of a predetermined class of words, each comprising a plurality of characters, comprising:
-
a first shift register having an input connected to the output of said keyboard, for storing the characters of said input word in an arrangement ordered in the sequence in which the characters are received, said first shift register having three adjacent storage cells K1, K2 and K3, with the end character of the input word initially stored in cell K1 ; a first bulk storage means for storing said predetermined class of words as a dictionary, selected characters composing selected ones of said dictionary words having stored in association therewith an error propensity indicium for indicating the propensity of the character to being operator mistyped on said keyboard through a character omission error mode; a second shift register having an input connected to the output of said first storage means, for storing characters of a dictionary word input from said first storage means, in an arrangement ordered in the sequence in which the characters are received, said second shift register having three adjacent storage cells L1, L2, and L3, with the end character of the dictionary word initially stored in cell L1 ; said first storage means sequentially loading words from said predetermined class of dictionary words, into said second register; a switching means having a first input connected to the output of cells K1, K2 and K3 of said first shift register, for selectively switching input word characters stored therein to the output of said switching means, a second input connected to the output of cells L1, L2, and L3 of said second shift register, for selectively switching the dictionary word characters stored therein to the output of said switching means, and a third control input connected to said second shift register for controlling said selective switching of said imput word characters and said dictionary word characters by means of the error indicium associated with the dictionary word characters stored in cells L1 and L2 ; a second bulk storage means with an input connected to the output of said switching means, for storing a first type conditional probability P(Kn |Lm) that the input word character stored in cell Kn of said first shift register was operator mistyped by character substitution given that the dictionary word character stored in cell Lm of said second shift register was to be typed, for n=1, m=1, n=2, m=2, and for n=2, m=3, and a second type of conditional probability P(K1 |L1 L2) that the input word character stored in cell K1 of said first shift register was mistyped by an operator character omission error, given that the dictionary word characters stored in cells L1 and L2 of said second shift register was to be typed, said probabilities being accessed by said component input word characters and dictionary word characters which are selectively switched to the output of said switching means under the control of said error indicium associated with the dictionary word characters stored in cells L1 and L2 of said second shift register; a first multiplier means having an input connected to the output of said second storage means, for multiplying a first received conditional probability by a second received conditional probability accessed from said second storage means and outputting a first probability product, and for multiplying a third received conditional probability by a fourth received conditional probability accessed from said second storage means and outputting a second probability product, said switching means operating when the error indicium stored in the L1 cell of said second shift register indicates a propensity for operator character omission error for the characters stored in the L1 and L2 cells, to access the first type conditional probabilities P(K1 |L1) and P(K2 |L2) from said second storage means for transmission to said first multiplier means as said first received and said second received conditional probabilities for calculating said first probability product, and to access the second type conditional probability P(K1 |L1 L2) and the first type conditional probability P(K2 |L3) from said second storage means for transmission to said first multiplier means as said third received and fourth received conditional probability for the calculation of said second probability product; a first comparator having an input connected to said first multiplier means for comparing the magnitude of said first probability product with that for said second probability product; a shift control means having a first input connected to said first comparator and a second input connected to said L1 cell of said second shift register, a first output connected to a shift input on said first shift register and a second output connected to a shift input on said second shift register, for shifting the contents of said first and said second shift registers in accordance with the relative magnitudes of said first and said second probability product and the value of the error propensity indicium for the character stored in the L1 cell; a second multiplier means having an input connected to said second storage means and a second control input connected to said first comparator, for accepting said first received conditional probability if said first product is larger or said third received conditional probability if said second product is larger as determined by said first comparator, and multiplying by the running product of all of said conditional probabilities calculated for the dictionary word presently stored in said second shift register; said shift control means shifting the contents of both said first and said second shift registers by one cell when said first probability product is greater than said second probability product; said shift control means shifting the contents of said second shift register by two cells and the contents of said first shift register by one cell when said second probability product is greater than said first probability product and the error indicium associated with characters in the L1 and L2 cells of said second shift register indicates a character operator miskeying omission error propensity; a second comparator means having an input connected to said second multiplier means and an output connected to said first storage means, for selecting the dictionary word stored in said first storage means having the largest running product when matched with the input word stored in said first shift register; said first storage means outputting on an output line said dictionary word indicated by said second comparator means, as the most likely correct form for said mistyped input word.
-
-
39. An information processing system for selecting the correct form of an input word operator mistyped on a keyboard as a character transposition error, the correct form of the input word being a member of a predetermined class of words, each comprising a plurality of characters, comprising:
-
a first shift register having an input connected to the output of said keyboard, for storing the characters of said input word in an arrangement ordered in the sequence in which the characters are received, said first shift register having three adjacent storage cells K1, K2 and K3, with the end character of the input word initially stored in cell K1 ; a first bulk storage means for storing said predetermined class of words as a dictionary, selected characters composing selected ones of said dictionary words having stored in association therewith an error propensity indicium for indicating the propensity of the character to being operator mistyped on said keyboard through a character transposition error mode; a second shift register having an input connected to the output of said first storage means, for storing characters of a dictionary word input from said first storage means, in an arrangement ordered in the sequence in which the characters are received, said second shift register having three adjacent storage cells L1, L2, and L3, with the end character of the dictionary word initially stored in cell L1 ; said first storage means initiating the sequential loading of words from said predetermined class of dictionary words, into said second register, upon receipt of a signal over said reset control input indicating the receipt of a new input word from said keyboard output; a switching means having a first input connected to the output of cells K1, K2 and K3 of said first shift register, for selectively switching input word characters stored therein to the output of said switching means, a second input connected to the output of cells L1, L2 and L3 of said second shift register, for selectively switching the dictionary word characters stored therein to the output of said switching means, and a third control input connected to said second shift register for controlling said selective switching of said input word characters and said dictionary word characters by means of the error indicium associated with the dictionary word characters stored in cell L1 ; a second bulk storage means with an input connected to the output of said switching means, for storing a first type conditional probability P(Kn |Lm) that the input word character stored in cell Kn of said first shift register was operator mistyped by character substitution given that the dictionary word character stored in cell Lm of said second shift register was to be typed, for n=1, m=1 and n=2, m=2, and a second type conditional probability P(K1 K2 |L1 L2) that the input word characters stored in cells K1 and K2 of said first shift register were operator mistyped by a character transposition error, given that the dictionary word characters stored in cell L1 and L2 of said second shift register were to be typed, said probabilities being accessed by said component input word characters and dictionary word characters which are selectively switched to the output of said switching means under the control of said error indicium associated with the dictionary word characters stored in the cells L1 and L2 of said second shift register; a first multiplier means having an input connected to the output of said second storage means, for multiplying a first received conditional probability by a second received conditional probability accessed from said second storage means and outputting a probability product; said switching means operating when the error indicium associated with characters stored in the L1 and L2 cells of said second shift register indicates a propensity for operator transposition error for the characters stored in the L1 and L2 cells, to access the first type conditional probability P(K1 |L1) and P(K2 |L2) from said second storage means for transmission to said first multiplier means as said first received and second received conditional probabilities, for calculation of said probability product, and to access the second type conditional probability P(K1 K2 |L1 L2) from said second storage means; a first comparator having an input connected to said first multiplier means and said second storage means for comparing the magnitude of said probability product with that for said accessed second type conditional probability; a shift control means having a first input connected to said first comparator and a second input connected to said L1 cell of said second shift register, a first output connected to a shift input on said first shift register and a second output connected to a shift input on said second shift register, for shifting the contents of said first and said second shift registers in accordance with the relative magnitudes of said first and said second probability product and the value of the error propensity indicium for the character stored in the L1 cell; a second multiplier means having an input connected to said second storage means and a second control input connected to said first comparator, for accepting said first received conditional probability if said first product is larger or said second type conditional probability if said second product is larger, as determined by said first comparator, and multiplying by the running product of all of said conditional probabilities calculated for the dictionary word presently stored in said second shift register; said shift control means shifting the contents of both said first and second shift registers by one cell when said first probability product is greater than said second probability product; said shift control means shifting the contents of said first shift register by two cells and the contents of said second shift register by two cells when said second probability product is greater than said first probability product and the error indicium stored in the L1 cell of said second shift register indicates an operator character transposition error propensity; a second comparator means having an input connected to said second multiplier means and an output connected to said first storage means, for selecting the dictionary word stored in said first storage means having the largest running product when matched with the input word stored in said first shift register; said first storage means outputting on an output line said dictionary word indicated by said second comparator means, as the most likely correct form for said operator mistyped input word.
-
Specification