Systems and methods for extracting meaning from multimodal inputs using finite-state devices
First Claim
1. A multimodal recognition system that inputs an utterance comprising a plurality of associated modes, the system comprising:
- a plurality of mode recognition systems, each mode recognition system usable to recognize ones of the associated modes, each mode recognition system outputting a recognition result for each associated mode; and
a multimodal recognition system that inputs recognition results from at least a first one of the plurality of mode recognition systems, that generates, for at least a second one of the plurality of mode recognition systems distinct from the at least first one of the plurality of mode recognition systems, at least one recognition model based on the recognition results from the at least first one of the plurality of mode recognition systems, and that outputs each generated recognition model to a corresponding one of the at least second one of the plurality of mode recognition systems;
wherein each corresponding mode recognition system of the at least second one of the plurality of mode recognition systems generates the recognition result for the associated mode for that corresponding mode recognition system based on the corresponding generated recognition model.
17 Assignments
0 Petitions
Accused Products
Abstract
Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
-
Citations
31 Claims
-
1. A multimodal recognition system that inputs an utterance comprising a plurality of associated modes, the system comprising:
-
a plurality of mode recognition systems, each mode recognition system usable to recognize ones of the associated modes, each mode recognition system outputting a recognition result for each associated mode; and
a multimodal recognition system that inputs recognition results from at least a first one of the plurality of mode recognition systems, that generates, for at least a second one of the plurality of mode recognition systems distinct from the at least first one of the plurality of mode recognition systems, at least one recognition model based on the recognition results from the at least first one of the plurality of mode recognition systems, and that outputs each generated recognition model to a corresponding one of the at least second one of the plurality of mode recognition systems;
wherein each corresponding mode recognition system of the at least second one of the plurality of mode recognition systems generates the recognition result for the associated mode for that corresponding mode recognition system based on the corresponding generated recognition model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for recognizing a multimodal utterance comprising a plurality of associated modes, the method comprising:
-
inputting at least a first mode and a second mode that is different from the first mode;
generating a first mode recognition lattice based on the first mode;
composing the first mode recognition lattice with a first finite-state transducer that relates the first mode to the second mode to generate a second finite-state transducer;
generating a projection of the second finite-state transducer; and
recognizing the second mode using the projection as a recognition model usable in recognizing the second mode. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification