Multimodal speech recognition system
First Claim
1. A multimodal system for receiving inputs via more than one mode from a user and for interpretation and display of the text intended by the inputs received through those multiple modes, the system comprising:
- a) a user input device having a plurality of modes with one mode for accessing speech and each of the remaining modes being associated with an input method for entering letters or characters or words or phrases or sentences or graphics or image or any combination;
b) a memory containing a plurality of acoustic networks, each of the plurality of acoustic networks associated with one or more inputs from the one or more modes;
c) a mechanism to feed back the system output to a user of the input device, in the form of a visual display or an audible speaker; and
d) a processor to;
i) process the inputs received via the multiple modes and select and/or build an appropriate acoustic network;
ii) perform automatic speech recognition using the selected/built acoustic network; and
iii) return the recognition outputs to the user for subsequent user-actions.
0 Assignments
0 Petitions
Accused Products
Abstract
The disclosure describes an overall system/method for text-input using a multimodal interface with speech recognition. Specifically, pluralities of modes interact with the main speech mode to provide the speech-recognition system with partial knowledge of the text corresponding to the spoken utterance forming the input to the speech recognition system. The knowledge from other modes is used to dynamically change the ASR system'"'"'s active vocabulary thereby significantly increasing recognition accuracy and significantly reducing processing requirements. Additionally, the speech recognition system is configured using three different system configurations (always listening, partially listening, and push-to-speak) and for each one of those three different user-interfaces are proposed (speak-and-type, type-and-speak, and speak-while-typing). Finally, the overall user-interface of the proposed system is designed such that it enhances existing standard text-input methods; thereby minimizing the behavior change for mobile users.
-
Citations
9 Claims
-
1. A multimodal system for receiving inputs via more than one mode from a user and for interpretation and display of the text intended by the inputs received through those multiple modes, the system comprising:
-
a) a user input device having a plurality of modes with one mode for accessing speech and each of the remaining modes being associated with an input method for entering letters or characters or words or phrases or sentences or graphics or image or any combination; b) a memory containing a plurality of acoustic networks, each of the plurality of acoustic networks associated with one or more inputs from the one or more modes; c) a mechanism to feed back the system output to a user of the input device, in the form of a visual display or an audible speaker; and d) a processor to; i) process the inputs received via the multiple modes and select and/or build an appropriate acoustic network; ii) perform automatic speech recognition using the selected/built acoustic network; and iii) return the recognition outputs to the user for subsequent user-actions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification