Systems and methods for extracting meaning from multimodal inputs using finite-state devices
First Claim
1. A method for recognizing an utterance comprising:
- receiving an utterance comprising a first portion having a first mode and a second portion having a second mode;
generating a first mode recognition lattice and a first finite-state transducer, based on the first mode;
relating the first portion of the utterance to the second portion of the utterance based on the first finite-state transducer;
generating a second finite-state transducer, comprising a gesture and speech recognition model finite-state transducer, based on the first mode recognition lattice and the first finite-state transducer; and
outputting a recognition result based on the utterance and the second finite-state transducer.
15 Assignments
0 Petitions
Accused Products
Abstract
Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
-
Citations
18 Claims
-
1. A method for recognizing an utterance comprising:
-
receiving an utterance comprising a first portion having a first mode and a second portion having a second mode; generating a first mode recognition lattice and a first finite-state transducer, based on the first mode; relating the first portion of the utterance to the second portion of the utterance based on the first finite-state transducer; generating a second finite-state transducer, comprising a gesture and speech recognition model finite-state transducer, based on the first mode recognition lattice and the first finite-state transducer; and outputting a recognition result based on the utterance and the second finite-state transducer. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for recognizing an utterance comprising:
-
receiving an utterance comprising a plurality of modes; relating a first portion of the utterance comprising a first mode of the plurality of modes to a second portion of the utterance comprising a second mode of the plurality of modes, based on a first finite-state transducer; generating a second finite-state transducer, comprising a gesture and speech recognition model finite-state transducer, based on a first mode recognition lattice and the first finite-state transducer; and outputting a recognition result based on the utterance and the second finite-state transducer. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. A method for extracting meaning from multimodal inputs comprising:
-
receiving a multimodal input; relating a first portion of the multimodal input comprising a first mode to a second portion of the multimodal input comprising a second mode, based on a first finite-state transducer; generating a second finite-state transducer, comprising a gesture and speech recognition model finite-state transducer, based on a first mode recognition lattice and the first finite-state transducer; and outputting a recognition result based on the multimodal input and the second finite-state transducer. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification