Multimodal speech recognition system
First Claim
1. A multimodal system for receiving inputs via more than one mode from a user and for interpretation and display of text based upon the inputs received via the more than one modes, the system comprising:
- a) a user input device having a plurality of modes, one mode accepting speech input and the remaining modes accepting entry of non-speech input;
b) a memory containing a plurality of acoustic networks, each of the plurality of acoustic networks being associated with at least one mode; and
c) a processor to;
i) process the speech input and at least one non-speech input accepted by at least one of the remaining modes;
ii) dynamically adapting an acoustic network based on the speech input and the at least one non-speech input;
iii) perform automatic speech recognition using the dynamically adapted acoustic network;
iv) determine an output based on the automatic speech recognition; and
v) return the output to aid in a determination of a subsequent user-action.
0 Assignments
0 Petitions
Accused Products
Abstract
The disclosure describes an overall system/method for text-input using a multimodal interface with speech recognition. Specifically, pluralities of modes interact with the main speech mode to provide the speech-recognition system with partial knowledge of the text corresponding to the spoken utterance forming the input to the speech recognition system. The knowledge from other modes is used to dynamically change the ASR system'"'"'s active vocabulary thereby significantly increasing recognition accuracy and significantly reducing processing requirements. Additionally, the speech recognition system is configured using three different system configurations (always listening, partially listening, and push-to-speak) and for each one of those three different user-interfaces are proposed (speak-and-type, type-and-speak, and speak-while-typing). Finally, the overall user-interface of the proposed system is designed such that it enhances existing standard text-input methods; thereby minimizing the behavior change for mobile users.
45 Citations
19 Claims
-
1. A multimodal system for receiving inputs via more than one mode from a user and for interpretation and display of text based upon the inputs received via the more than one modes, the system comprising:
-
a) a user input device having a plurality of modes, one mode accepting speech input and the remaining modes accepting entry of non-speech input; b) a memory containing a plurality of acoustic networks, each of the plurality of acoustic networks being associated with at least one mode; and c) a processor to; i) process the speech input and at least one non-speech input accepted by at least one of the remaining modes; ii) dynamically adapting an acoustic network based on the speech input and the at least one non-speech input; iii) perform automatic speech recognition using the dynamically adapted acoustic network; iv) determine an output based on the automatic speech recognition; and v) return the output to aid in a determination of a subsequent user-action. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification