Continuous parameter hidden Markov model approach to automatic handwriting recognition
First Claim
1. A computer-based method for recognizing handwriting, wherein the computer comprises an input device, a memory module, a preprocessor unit, a front end unit and a modeling component, the method comprising the steps of:
- (1) receiving character signals into the preprocessor unit from the input device, said character signals representing training observation sequences of sample characters;
(2) sorting said character signals in the preprocessor unit according to lexemes which represent different writing styles for a given character, by mapping said character signals in lexographic space, said lexographic space being a location in the memory module which contains one or more character-level feature vectors, to find high-level variations in said character signals;
(3) selecting one of said lexemes;
(4) generating sequences of feature vector signals in the front end unit representing feature vectors for said character signals associated with said selected lexeme by mapping in chirographic space, said chirographic space being a location in the memory module which contains one or more flame-level feature vectors; and
(5) generating a Markov model signal in the modeling component representing a hidden Markov model for said selected lexeme, said hidden Markov model having model parameter signals and one or more states, each of said states having emission transitions and non-emission transitions, wherein said step (5) comprises the steps of;
(i) initializing said model parameter signals comprising the steps of;
(a) setting a length for said hidden Markov model;
(b) initializing state transition probabilities of said hidden Markov model to be uniform;
(c) for each of said states, tying one or more output probability distributions for said emission transitions;
(d) for each of said states, assigning a Gaussian density distribution for each of one or more codebooks; and
(e) alternatively initializing one or more mixture coefficients to be values obtained from a statistical mixture model; and
(ii) updating said model parameter signals.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-based system and method for recognizing handwriting. The present invention includes a preprocessor, a front end, and a modeling component. The present invention operates as follows. First, the present invention identifies the lexemes for all characters of interest. Second, the present invention performs a training phase in order to generate a hidden Markov model for each of the lexemes. Third, the present invention performs a decoding phase to recognize handwritten text. Hidden Markov models for lexemes are produced during the training phase. The present invention performs the decoding phase as follows. The present invention receives test characters to be decoded (that is, to be recognized). The present invention generates sequences of feature vectors for the test characters by mapping in chirographic space. For each of the test characters, the present invention computes probabilities that the test character can be generated by the hidden Markov models. The present invention decodes the test character as the recognized character associated with the hidden Markov model having the greatest probability.
-
Citations
26 Claims
-
1. A computer-based method for recognizing handwriting, wherein the computer comprises an input device, a memory module, a preprocessor unit, a front end unit and a modeling component, the method comprising the steps of:
-
(1) receiving character signals into the preprocessor unit from the input device, said character signals representing training observation sequences of sample characters; (2) sorting said character signals in the preprocessor unit according to lexemes which represent different writing styles for a given character, by mapping said character signals in lexographic space, said lexographic space being a location in the memory module which contains one or more character-level feature vectors, to find high-level variations in said character signals; (3) selecting one of said lexemes; (4) generating sequences of feature vector signals in the front end unit representing feature vectors for said character signals associated with said selected lexeme by mapping in chirographic space, said chirographic space being a location in the memory module which contains one or more flame-level feature vectors; and (5) generating a Markov model signal in the modeling component representing a hidden Markov model for said selected lexeme, said hidden Markov model having model parameter signals and one or more states, each of said states having emission transitions and non-emission transitions, wherein said step (5) comprises the steps of; (i) initializing said model parameter signals comprising the steps of; (a) setting a length for said hidden Markov model; (b) initializing state transition probabilities of said hidden Markov model to be uniform; (c) for each of said states, tying one or more output probability distributions for said emission transitions; (d) for each of said states, assigning a Gaussian density distribution for each of one or more codebooks; and (e) alternatively initializing one or more mixture coefficients to be values obtained from a statistical mixture model; and (ii) updating said model parameter signals. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-based system for recognizing handwriting, the computer comprising an input device, a memory module, a preprocessor unit, a front end unit and a modeling component, the system comprising:
-
(a) means for receiving character signals into the preprocessor unit from the input device representing training observation sequences of sample characters; (b) means for sorting said character signals in the preprocessor unit according to lexemes which represent different writing styles for a given character, by mapping said sample characters in lexographic space, said lexographic space being a location in the memory module which contains one or more character-level feature vectors, to find high-level variations in said character signals; (c) means for generating sequences of feature vector signals in the front end unit representing feature vectors for said character signals by mapping in chirographic space, said chirographic space being a location in the memory module which contains one or more frame-level feature vectors; (d) means for generating Markov model signals in the modeling component representing hidden Markov models for said lexemes, each of said hidden Markov models having model parameter signals and one or more states, each of said states having emission transitions and non-emission transitions, wherein said generating means comprises; (i) means for initializing said model parameter signals in each of said hidden Markov models, said initializing means comprising; means for setting a length for said hidden Markov model; means for initializing state transition probabilities of said hidden Markov model to be uniform; means for tying one or more output probability distributions for said emission transitions for each of said states; means for assigning a Gaussian density distribution for each of one or more codebooks for each of said states; and means for alternatively initializing one or more mixture coefficients to be values-obtained from a statistical mixture model; and (ii) means for updating said model parameter signals in each of said hidden Markov models. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26)
-
Specification