Techniques for disambiguating speech input using multimodal interfaces

US 20040172258A1
Filed: 12/10/2003
Published: 09/02/2004
Est. Priority Date: 12/10/2002
Status: Active Grant

First Claim

Patent Images

1. A system for disambiguating speech input comprising:

a speech recognition component that receives recorded audio or speech input and generates;

one or more tokens corresponding to the speech input; and

for each of the one or more tokens, a confidence value indicative of the likelihood that the a given token correctly represents the speech input;

a selection component that identifies, according to a selection algorithm, which two or more tokens are to be presented to a user as alternatives;

one or more disambiguation components that perform an interaction with the user to present the alternatives and to receive a selection of alternatives from the user, the interaction taking place in at least a visual mode; and

an output interface that presents the selected alternative to an application as input.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technique is disclosed for disambiguating speech input for multimodal systems by using a combination of speech and visual I/O interfaces. When the user'"'"'s speech input is not recognized with sufficiently high confidence, a the user is presented with a set of possible matches using a visual display and/or speech output. The user then selects the intended input from the list of matches via one or more available input mechanisms (e.g., stylus, buttons, keyboard, mouse, or speech input). These techniques involve the combined use of speech and visual interfaces to correctly identify user'"'"'s speech input. The techniques disclosed herein may be utilized in computer devices such as PDAs, cellphones, desktop and laptop computers, tablet PCs, etc.

Citations

14 Claims

1. A system for disambiguating speech input comprising:
- a speech recognition component that receives recorded audio or speech input and generates;
  
  one or more tokens corresponding to the speech input; and
  
  for each of the one or more tokens, a confidence value indicative of the likelihood that the a given token correctly represents the speech input;
  
  a selection component that identifies, according to a selection algorithm, which two or more tokens are to be presented to a user as alternatives;
  
  one or more disambiguation components that perform an interaction with the user to present the alternatives and to receive a selection of alternatives from the user, the interaction taking place in at least a visual mode; and
  
  an output interface that presents the selected alternative to an application as input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the disambiguation components and the application reside on a single computing device.
  - 3. The system of claim 1, wherein the disambiguation components and the application reside on separate computing devices.
  - 4. The system of claim 1, wherein the one or more disambiguation components perform said interaction by presenting the user with alternatives in a visual mode, and by receiving the user'"'"'s selection in a visual mode.
  - 5. The system of claim 4, wherein the disambiguation components present the alternatives to the user in a visual form and allow the user to select from among the alternatives using a voice input.
  - 6. The system of claim 1, wherein the one or more disambiguation components perform said interaction by presenting the user with alternatives in a visual mode, and by receiving the user'"'"'s selection in either a visual mode, a voice mode, or a combination of visual mode and voice mode.
  - 7. The system of claim 1, wherein the selection component filters the one or more tokens according to a set of parameters.
  - 8. The system of claim 7, wherein the set of parameters is user specified.
  - 9. The system of claim 1, wherein the one or more disambiguation components disambiguates the alternatives in plural iterative stages, whereby the first stage narrows the alternatives to a number of alternatives that is smaller than that initially generated by the selection component, but greater than one, and whereby the one or more disambiguation components operative iteratively to narrow the alternatives in subsequent iterative stages.
  - 10. The system of claim 9, whereby the number of iterative stages is limited to a specified number.

11. A method of processing speech input comprising:
- receiving a speech input from a user;
  
  determining whether the speech input is ambiguous;
  
  if the speech input is not ambiguous, then communicating a token representative of the speech input to an application as input to the application; and
  
  if the speech input is ambiguous;
  
  performing an interaction with the user whereby the user is presented with plural alternatives and selects an alternative from among the plural alternatives, the interaction being performed in at least a visual mode;
  
  communicating the selected alternative to the application as input to the application.
- View Dependent Claims (12, 13, 14)
- - 12. The method of claim 11, wherein the interaction comprises the concurrent use of said visual mode and said voice mode.
  - 13. The method of claim 12, wherein the interaction comprises the user selecting from among the plural alternatives using a combination of speech and visual-based input.
  - 14. The method of claim 11, wherein the interaction comprises the user selecting from among the plural alternatives using visual input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Waloomba Tech Limited LLC (Intellectual Ventures LLC)
Original Assignee
Kirusa Inc.
Inventors
Dominach, Richard F., Sibal, Sandeep, Vaidya, Shirish, Isukapalli, Sastry

Granted Patent

US 7,684,985 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/277
CPC Class Codes

G10L 15/22 Procedures used during a sp...

Techniques for disambiguating speech input using multimodal interfaces

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Techniques for disambiguating speech input using multimodal interfaces

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links