Message recognition employing integrated speech and handwriting information
First Claim
1. A message recognition system comprising:
- a first transducer for converting a user'"'"'s speech to a first signal;
a second transducer for converting the user'"'"'s handwriting to a second signal; and
a data processor, having a first input coupled to the first signal and a second input coupled to the second signal, for processing the first signal and the second signal to identify an informational content of the first signal and the second signal, said data processor including, a first likelihood estimator for generating a first list comprised of one or more probable messages conveyed by the informational content of the first signal;
a second likelihood estimator for generating a second list comprised of one or more probable messages conveyed by the information content of the second signal;
wherein a probable message is comprised of at least one word;
a likelihood merger for selectively merging the first list and the second list to form a third list;
a decoder for selecting from the third list a most probable one of the probable messages to be an output message; and
means for outputting the output message.
0 Assignments
0 Petitions
Accused Products
Abstract
A method of, and apparatus for, operating an automatic message recognition system. In accordance with the method the following steps are executed: a user'"'"'s speech is converted to a first signal; a user'"'"'s handwriting is converted to a second signal; and the first signal and the second signal are processed to decode a consistent message, conveyed separately by the first signal and by the second signal, or conveyed jointly by the first signal and the second signal. The step of processing includes the steps of converting the first signal into a plurality of first multi-dimensional vectors and converting the second signal into a plurality of second multi-dimensional vectors. For a system employing a combined use of speech and handwriting the step of processing includes a further step of combining individual ones of the plurality of first multi-dimensional vectors and individual ones of the plurality of second multi-dimensional vectors to form a plurality of third multi-dimensional vectors. The multi-dimensional vectors are employed to train a single set of word models, for joint use of speech and handwriting, or two sets of word models, for sequentially employed or merged speech and handwriting.
246 Citations
46 Claims
-
1. A message recognition system comprising:
-
a first transducer for converting a user'"'"'s speech to a first signal;
a second transducer for converting the user'"'"'s handwriting to a second signal; and
a data processor, having a first input coupled to the first signal and a second input coupled to the second signal, for processing the first signal and the second signal to identify an informational content of the first signal and the second signal, said data processor including, a first likelihood estimator for generating a first list comprised of one or more probable messages conveyed by the informational content of the first signal;
a second likelihood estimator for generating a second list comprised of one or more probable messages conveyed by the information content of the second signal;
wherein a probable message is comprised of at least one word;
a likelihood merger for selectively merging the first list and the second list to form a third list;
a decoder for selecting from the third list a most probable one of the probable messages to be an output message; and
means for outputting the output message. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
means, responsive to the plurality of first multi-dimensional vectors that result from the operation of the first transforming means, for training a first set of word models to generate one or more candidate words in response to a spoken word; and
means, responsive to the plurality of second multi-dimensional vectors that result from the operation of the second transforming means, for training a second set of word models to generate one or more candidate words in response to a written word.
-
-
6. A system as set forth in claim 5 wherein the first set of word models is comprised of hidden Markov word models.
-
7. A system as set forth in claim 5 wherein the first set of word models is comprised of neural network means.
-
8. A system as set forth in claim 5 wherein the second set of word models includes character prototypes comprised of stroke templates generated by an elastic stroke matching means.
-
9. A system as set forth in claim 6 wherein said likelihood merger includes means, responsive to a first list of candidate words obtained by computation on the first set of word models in response to the first signal, and responsive to a second list of candidate words obtained by computation on the second set of word models in response to the second signal, for generating a third, merged list of candidate words.
-
10. A system as set forth in claim 9 wherein said decoder includes means, responsive to the generated third list, for selecting from the third list a candidate word that has a highest probability of representing all or a part of the probable message, and for outputting the selected candidate word as an output word.
-
11. A system as set forth in claim 9 wherein the generating means includes means for scaling, if required, probabilities associated with candidate words from the first list relative to probabilities associated with candidate words from the second list.
-
12. A system as set forth in claim 1 wherein said likelihood merger is responsive to predetermined weighting factors for merging the first list of probable messages and the second list of probable messages.
-
13. A message recognition system comprising:
-
a first transducer for converting a user'"'"'s speech to a first signal;
a second transducer operating in parallel with said first transducer for converting the user'"'"'s handwriting to a second signal;
a signal combiner, having a first input coupled to the first signal and a second input coupled to the second signal, for combining the first signal and the second signal to generate a third signal;
a likelihood estimator having an input coupled to said third signal for generating a list comprised of one or more probable messages conveyed by the informational content of the third signal, wherein a probable message is comprised of at least one word; and
a decoder for selecting from the list a most probable one of the probable messages to be an output message. - View Dependent Claims (14, 15, 16, 17, 18, 19)
first means for transforming the first signal into a plurality of first multi-dimensional vectors;
second means for transforming the second signal into a plurality of second multi-dimensional vectors; and
means for combining individual ones of the plurality of first multi-dimensional vectors and individual ones of the plurality of second multi-dimensional vectors to form a plurality of third multi-dimensional vectors.
-
-
15. A system as set forth in claim 14 wherein the signal combiner includes means, responsive to the plurality of third multi-dimensional vectors that result from the operation of the combining means, for training a single set of word models to generate one or more candidate words in response to a written word and also in response to a spoken word that corresponds to the written word.
-
16. A system as set forth in claim 15 wherein the single set of word models is comprised of hidden Markov word models.
-
17. A system as set forth in claim 15 wherein the single set of word models is comprised of a neural network means.
-
18. A system as set forth in claim 15 wherein said decoder is responsive to a list of candidate words obtained by computation on the single set of word models, for selecting a candidate word from the list that has a highest probability of representing all or a part of the probable message, and for outputting the selected candidate word.
-
19. A system as set forth in claim 13 wherein said signal combiner includes means for normalizing, to a common time base, the first signal with respect to the second signal.
-
20. A method for operating a message recognition system, comprising the steps of:
-
operating a first transducer for converting a user'"'"'s speech to a first signal;
operating a second transducer for converting the user'"'"'s handwriting to a second signal; and
operating a digital data processor for processing the first signal and the second signal to identify an informational content of a consistent message that is conveyed separately by the first signal and by the second signal or that is conveyed jointly by the first signal and by the second signal, the step of processing including the steps of, operating at least one likelihood estimator for generating one or more probable messages conveyed by the informational content of both the first signal and the second signal, wherein a probable message is comprised of at least one word; and
operating a decoder for selecting a most probable one of the probable messages to be an output message, wherein the step of operating a decoder includes a step of considering a first weighting value, that is associated with the first signal, and a second weighting value, that is associated with the second signal. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A method of operating a message recognition system, comprising the steps of:
-
operating a first transducer for converting a user'"'"'s speech to a first signal;
operating a second transducer for converting, in parallel with the step of converting a user'"'"'s speech, the user'"'"'s handwriting to a second signal;
combining with a digital data processor the first signal and the second signal to generate a third signal;
operating a likelihood estimator for generating a list comprised of one or more probable messages conveyed by the informational content of the third signal, wherein a probable message is comprised of at least one word; and
operating a decoder for selecting from the list a most probable one of the probable messages to be an output message. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A message recognition system that includes a digital data processor, said digital data processor comprising:
-
a user interface having a first input coupled to an output of a speech transducer means and a second input coupled to an output of a handwriting transducer means, for receiving signals therefrom and for converting the signals to a first multi-dimensional representation of a speech signal and to a second multi-dimensional representation of a handwriting signal;
a first likelihood estimator, having an input coupled to said first multi-dimensional representation of the speech signal, for generating, in accordance with an associated first word model and in response to the first multi-dimensional representation, a first list comprised of one or more probable words that the first multi-dimensional representation may represent;
a second likelihood estimator, having an input coupled to said second multi-dimensional representation of the handwriting signal, for generating, in accordance with an associated second word model and in response to the second multi-dimensional representation, a second list comprised of one or more probable words that the second multi-dimensional representation may represent;
a likelihood merger, having an input coupled to an output of said first generating means and to an output of said second generating means, for selectively merging said first list and said second list into a third list comprised of probable words; and
a decoder that is responsive to the probable words of said third list, for selecting a most probable word as an output word. - View Dependent Claims (43)
-
-
44. A message recognition system that includes a digital data processor, said digital data processor comprising:
-
a user interface having a first input coupled to an output of a speech transducer means and a second input coupled to an output of a handwriting transducer means, for simultaneously receiving a speech signal from the speech transducer and a handwriting signal from the handwriting transducer and for converting the speech signal to a first multi-dimensional representation and for converting the handwriting signal to a second multi-dimensional representation;
a combiner for combining the first and the second multi-dimensional representations into a third multi-dimensional representation that is a combination of both the speech signal and the handwriting signal;
a likelihood estimator having an input coupled to said third multi-dimensional representation, for generating, in accordance with an associated word model and in response to the third multi-dimensional representation, a list comprised of one or more probable words that the third multi-dimensional representation may represent; and
a decoder that is responsive to the probable words of said list, for selecting a most probable word as an output word. - View Dependent Claims (45)
-
-
46. A message recognition system that includes a digital data processor, said digital data processor comprising:
-
a first user interface having an input coupled to an output of a speech transducer means, for converting an output thereof to a multi-dimensional representation of a speech signal;
a second user interface having an input coupled to an output of a handwriting transducer means, for converting an output thereof to a multi-dimensional representation of a handwriting signal;
a first likelihood estimator that is responsive to said multi-dimensional representation of the speech signal, for generating, in accordance with an associated first word model, a first list comprised of one or more probable words that the multi-dimensional representation of the speech signal may represent, said first likelihood estimator having an input coupled to an output of a first language model, said first likelihood estimator being responsive to said first language model for eliminating probable words from said first list that are incompatible with said first language model;
a second likelihood estimator that is responsive to said multi-dimensional representation of the handwriting signal, for generating, in accordance with an associated second word model, a second list comprised of one or more probable words that the multi-dimensional representation of the handwriting signal may represent, said second likelihood estimator having an input coupled to an output of a second language model, said second likelihood estimator being responsive to said second language model for eliminating probable words from said second list that are incompatible with said second language model;
a likelihood combiner having an input coupled to an output of said first likelihood estimator and to an output of said second likelihood estimator for selectively merging said first list and said second list into a third list comprised of probable words, said likelihood estimator being responsive to a set of predetermined weights; and
a decoder that is responsive to said third list, for selecting a most probable word as an output word.
-
Specification