MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION

US 20140081641A1
Filed: 11/14/2013
Published: 03/20/2014
Est. Priority Date: 06/02/2004
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method comprising:

receiving, by the mobile device, a voice input;

displaying, by the mobile device, at a text insertion point of a touch screen display, a most likely interpretation of word candidates of the voice input, the most likely interpretation resulting from a speech recognition process;

receiving, by the mobile device, on the touch screen display, a first non-voice input that selects said displayed most likely interpretation of said word candidates;

responsive to the first non-voice input, displaying for selection, by the mobile device, two or more word candidates on the touch screen display; and

receiving, by the mobile device, at said non-voice input field, a second non-voice input that selects an intended word candidate from among said two or more word candidates.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.

Citations

18 Claims

1. A computer implemented method comprising:
- receiving, by the mobile device, a voice input;
  
  displaying, by the mobile device, at a text insertion point of a touch screen display, a most likely interpretation of word candidates of the voice input, the most likely interpretation resulting from a speech recognition process;
  
  receiving, by the mobile device, on the touch screen display, a first non-voice input that selects said displayed most likely interpretation of said word candidates;
  
  responsive to the first non-voice input, displaying for selection, by the mobile device, two or more word candidates on the touch screen display; and
  
  receiving, by the mobile device, at said non-voice input field, a second non-voice input that selects an intended word candidate from among said two or more word candidates.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - for each of a series of additional non-voice inputs, determining additional word candidates that were determined from the first input.
  - 3. The method of claim 1, further comprising:
    - receiving the voice input and the first non-voice input substantially simultaneously; and
      
      automatically interpreting both inputs and mutually disambiguating both inputs to produce a best interpretation of both.
  - 4. The method of claim 1 wherein the two or more word candidates include the most likely interpretation and an alternative to the most likely interpretation.
  - 5. The method of claim 1 wherein the speech recognition process is performed by the mobile device.
  - 6. The method of claim 1 wherein the speech recognition process is performed by a server.
  - 7. The method of claim 1 wherein receiving the voice input is preceeded by receiving activation of a push-to-talk button on the mobile device.
  - 8. The method of claim 1 wherein the mobile device has limited space for a keyboard or touch-screen input.

9. A computer program product, tangibly embodied in a non-transitory computer-readable storage medium, the computer program product including instructions operable to cause a data processing apparatus to:
- receive a voice input;
  
  display, at a text insertion point of a touch screen display, a most likely interpretation of word candidates of the voice input, the most likely interpretation resulting from a speech recognition process;
  
  receive, on the touch screen display, a first non-voice input that selects said displayed most likely interpretation of said word candidates;
  
  responsive to the first non-voice input, display for selection two or more word candidates on the touch screen display; and
  
  receive, at said non-voice input field, a second non-voice input that selects an intended word candidate from among said two or more word candidates.
- View Dependent Claims (10, 11, 12)
- - 10. The computer program product of claim 9 wherein the two or more word candidates include the most likely interpretation and an alternative to the most likely interpretation.
  - 11. The computer program product of claim 9 wherein the speech recognition process is performed by the mobile device.
  - 12. The computer program product of claim 9 wherein the speech recognition process is performed by a server.

13. A mobile device including a processor configured to:
- receive a voice input;
  
  display, at a text insertion point of a touch screen display, a most likely interpretation of word candidates of the voice input, the most likely interpretation resulting from a speech recognition process;
  
  receive, on the touch screen display, a first non-voice input that selects said displayed most likely interpretation of said word candidates;
  
  responsive to the first non-voice input, display for selection two or more word candidates on the touch screen display; and
  
  receive, at said non-voice input field, a second non-voice input that selects an intended word candidate from among said two or more word candidates.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The mobile device of claim 13 wherein the two or more word candidates include the most likely interpretation and an alternative to the most likely interpretation.
  - 15. The mobile device of claim 13 wherein the processor is further configured to perform the speech recognition process.
  - 16. The mobile device of claim 13 wherein the processor is further configured to receive from a server results of the speech recognition process.
  - 17. The mobile device of claim 13 further comprising a push-to-talk button for activating a microphone to receive the voice input.
  - 18. The mobile device of claim 13 wherein the touch screen display has limited space for a keyboard or touch-screen input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Tegic Communications, Inc. (Microsoft Corporation)
Inventors
LONG, Michael R., HULLFISH, Keith C., EYRAUD, Richard

Granted Patent

US 9,786,273 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/257
CPC Class Codes

G10L 15/18   using natural language mode...

G10L 15/24   Speech recognition using no...

G10L 15/32   Multiple recognisers used i...

MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links