Systems and methods for extracting meaning from multimodal inputs using finite-state devices
First Claim
1. A finite-state multimodal recognition system that generates a multimodal meaning based on an utterance comprising a plurality of associated modes, the system comprising:
- means for receiving said utterance;
a plurality of finite-state mode recognition systems, each finite-state mode recognition system usable to recognize ones of the associated modes, each finite-state mode recognition system outputting at least one recognition lattice for each associated mode; and
an n-tape finite-state device that inputs n−
1 recognition lattices from the plurality of finite-state mode recognition subsystems and outputs the multimodal meaning based on the n−
1 recognition lattices.
11 Assignments
0 Petitions
Accused Products
Abstract
Finite-state systems and methods allow multiple input streams to be parsed and integrated by a single finite-state device. These systems and methods not only address multimodal recognition, but are also able to encode semantics and syntax into a single finite-state device. The finite-state device provides models for recognizing multimodal inputs, such as speech and gesture, and composes the meaning content from the various input streams into a single semantic representation. Compared to conventional multimodal recognition systems, finite-state systems and methods allow for compensation among the various input streams. Finite-state systems and methods allow one input stream to dynamically alter a recognition model used for another input stream, and can reduce the computational complexity of multidimensional multimodal parsing. Finite-state devices provide a well-understood probabilistic framework for combining the probability distributions associated with the various input streams and for selecting among competing multimodal interpretations.
110 Citations
53 Claims
-
1. A finite-state multimodal recognition system that generates a multimodal meaning based on an utterance comprising a plurality of associated modes, the system comprising:
-
means for receiving said utterance; a plurality of finite-state mode recognition systems, each finite-state mode recognition system usable to recognize ones of the associated modes, each finite-state mode recognition system outputting at least one recognition lattice for each associated mode; and an n-tape finite-state device that inputs n−
1 recognition lattices from the plurality of finite-state mode recognition subsystems and outputs the multimodal meaning based on the n−
1 recognition lattices.
-
-
2. A finite-state multimodal recognition system that generates a multimodal meaning based on an utterance comprising a pair of associated modes, the system comprising:
-
means for receiving said utterance; a pair of finite-state mode recognition systems, each finite-state mode recognition system usable to recognize one of the associated modes, each finite-state mode recognition system outputting at least one recognition lattice for each associated mode; and a multimodal recognition system that inputs a recognition lattice from each of the pair of mode recognition systems and outputs the multimodal meaning for the pair of associated modes based on the plurality of recognition results, comprising; a first system that inputs the pair of recognition lattices and outputs a combined recognition finite-state transducer; a second system the inputs the combined recognition finite-state transducer and outputs a combined recognition finite-state machine, and a third system that inputs the combined recognition finite-state machine and a multimodal meaning grammar and outputs the multimodal meaning.
-
-
3. A multimodal recognition system that generates a multimodal recognition based on an utterance comprising a plurality of associated modes, the system comprising:
-
means for receiving said utterance; a plurality of mode recognition subsystems, each mode recognition subsystem usable to recognize ones of the associated modes, each mode recognition subsystem outputting at least one recognition result for each associated mode; and a multimodal recognition subsystem that inputs recognition results from each of the plurality of mode recognition subsystems and outputs the multimodal recognition for the plurality of associated modes based on the plurality of recognition results; wherein each of the plurality of mode recognition subsystems and the multimodal recognition subsystem includes at least one finite-state machine having at least one tape. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A method for recognizing a multimodal utterance comprising a plurality of different modes, the method comprising:
-
receiving said multimodal utterance; inputting at least a first mode of the multimodal utterance and a second mode of the multimodal utterance that is different than the first mode; generating a first mode recognition lattice from the first mode; generating a second mode recognition lattice from the second mode; generating a first finite-state transducer based on the first mode recognition lattice and a second finite-state transducer; generating a third finite-state transducer based on the first finite-state transducer and the second mode recognition lattice; converting the third finite-state transducer to a first finite-state machine; and generating a multimodal recognition based on the first finite-state machine and a fourth finite-state transducer. - View Dependent Claims (46, 47, 48, 49, 50, 51, 52, 53)
-
Specification