Techniques for disambiguating speech input using multimodal interfaces
First Claim
1. A system for disambiguating speech input comprising:
- a speech recognition component that receives recorded audio or speech input and generates;
one or more tokens corresponding to the speech input; and
for each of the one or more tokens, a confidence value indicative of the likelihood that the a given token correctly represents the speech input;
a selection component that identifies, according to a selection algorithm, which two or more tokens are to be presented to a user as alternatives;
one or more disambiguation components that perform an interaction with the user to present the alternatives and to receive a selection of alternatives from the user, the interaction taking place in at least a visual mode; and
an output interface that presents the selected alternative to an application as input.
4 Assignments
0 Petitions
Accused Products
Abstract
A technique is disclosed for disambiguating speech input for multimodal systems by using a combination of speech and visual I/O interfaces. When the user'"'"'s speech input is not recognized with sufficiently high confidence, a the user is presented with a set of possible matches using a visual display and/or speech output. The user then selects the intended input from the list of matches via one or more available input mechanisms (e.g., stylus, buttons, keyboard, mouse, or speech input). These techniques involve the combined use of speech and visual interfaces to correctly identify user'"'"'s speech input. The techniques disclosed herein may be utilized in computer devices such as PDAs, cellphones, desktop and laptop computers, tablet PCs, etc.
-
Citations
14 Claims
-
1. A system for disambiguating speech input comprising:
-
a speech recognition component that receives recorded audio or speech input and generates;
one or more tokens corresponding to the speech input; and
for each of the one or more tokens, a confidence value indicative of the likelihood that the a given token correctly represents the speech input;
a selection component that identifies, according to a selection algorithm, which two or more tokens are to be presented to a user as alternatives;
one or more disambiguation components that perform an interaction with the user to present the alternatives and to receive a selection of alternatives from the user, the interaction taking place in at least a visual mode; and
an output interface that presents the selected alternative to an application as input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of processing speech input comprising:
-
receiving a speech input from a user;
determining whether the speech input is ambiguous;
if the speech input is not ambiguous, then communicating a token representative of the speech input to an application as input to the application; and
if the speech input is ambiguous;
performing an interaction with the user whereby the user is presented with plural alternatives and selects an alternative from among the plural alternatives, the interaction being performed in at least a visual mode;
communicating the selected alternative to the application as input to the application. - View Dependent Claims (12, 13, 14)
-
Specification