Multimodal disambiguation of speech recognition

US 20050283364A1
Filed: 06/01/2005
Published: 12/22/2005
Est. Priority Date: 12/04/1998
Status: Active Grant

First Claim

Patent Images

1. A method for processing language input in a data processing system, comprising the steps of:

receiving a first input comprising a voice input;

determining one or more word candidates according to the first input;

receiving a second input comprising a non-voice input; and

determining one or more word candidates according to the first input and the second input.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.

Citations

20 Claims

1. A method for processing language input in a data processing system, comprising the steps of:
- receiving a first input comprising a voice input;
  
  determining one or more word candidates according to the first input;
  
  receiving a second input comprising a non-voice input; and
  
  determining one or more word candidates according to the first input and the second input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the one or more word candidates are determined based on the second input under constraint of the first input.
  - 3. The method of claim 2, wherein the one or more word candidates are determined based on first input in view of word context.
  - 4. The method of claim 3, wherein the word context is based any of:
    - a N-gram language model; and
      
      a language model of a speech recognition engine.
  - 5. The method of claim 1, wherein the step of determining the one or more word candidates comprises the step of correcting or filtering the first plurality of word candidates based on the second input.
  - 6. The method of claim 1, wherein the second input is received on a mobile device;
    - and wherein speech recognition on the voice input is partially performed on the mobile device and partially performed on a server coupled to the mobile device through a wireless communication connection.
  - 7. The method of claim 6, wherein the speech recognition is activated by a push-to-talk button on the mobile device.
  - 8. The method of claim 1, wherein the second input is received while one of the one or more of the word candidates is presented for selection or editing.
  - 9. The method of claim 8, wherein the second input comprises any of:
    - a touch screen keyboard;
      
      handwriting gesture recognition; and
      
      a keypad input.
  - 10. The method of claim 1, wherein the first input is interpreted as punctuation or one or more other symbols when the second input is associated with punctuation or symbols.

11. A machine readable medium having instructions stored therein which, when executed on a data processing system, cause the data processing system to perform a method for processing language input, the method comprising the steps of:
- receiving a first input comprising a voice input;
  
  determining one or more word candidates according to the first input;
  
  receiving a second input comprising a non-voice input; and
  
  determining one or more word candidates according to the first input and the second input.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The medium of claim 11, wherein the one or more word candidates are determined based on the second input under constraint of the first input and in view of word context;
    - and the word context is based on any of;
      
      a N-gram language model; and
      
      a language model of a speech recognition engine.
  - 13. The medium of claim 11, wherein the step of determining of the one or more word candidates comprises the step of correcting a list of the first plurality of word candidates.
  - 14. The medium of claim 11, wherein the second input is received on a client computing device;
    - wherein speech recognition on the voice input is partially performed on the device and partially performed on a server coupled to the device through a data connection; and
      
      wherein the speech recognition is activated by a push-to-talk button on the device.
  - 15. The medium of claim 11, wherein the second input is received either while one of one or more of the word candidates is presented for editing or while the first plurality of the word candidates is presented for selection;
    - and the second input comprises any of;
      
      a touch screen keyboard;
      
      handwriting gesture recognition; and
      
      a keypad input.

16. A mobile device for processing language input, comprising:
- a speech recognition module for processing a first input comprising a voice input; and
  
  one or more second input modules for processing second input comprising a non-voice input;
  
  a processing module coupled to the one or more second input modules and the speech recognition module, the processing module determining a first plurality of word candidates according to the first input and subsequently determining one or more word candidates according to the first input and the second input.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The device of claim 16, wherein the one or more word candidates are determined based on said second input under constraint of the first input and in view of word context;
    - and the word context is based on any of;
      
      a N-gram language model; and
      
      a language model of a speech recognition engine.
  - 18. The device of claim 16, wherein the one or more word candidates are determined by correcting a list of the first plurality of word candidates.
  - 19. The device of claim 16, wherein speech recognition of the voice input is partially performed on the mobile device and partially performed on a server coupled to the mobile device through a wireless communication connection;
    - and wherein the speech recognition is activated by a push-to-talk button on the mobile device.
  - 20. The device of claim 16, wherein the second input is received either while one of the first plurality of the word candidates is presented for editing or while the one or more of the word candidates is presented for selection;
    - and the second input comprises any of;
      
      a touch screen keyboard;
      
      handwriting gesture recognition; and
      
      a keypad input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Tegic Communications, Inc. (Microsoft Corporation)
Inventors
Eyraud, Richard, Hullfish, Keith C., Longe, Michael

Granted Patent

US 7,881,936 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/257
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/268   Lexical context

G06V 30/36   Matching; Classification

G10L 15/24   Speech recognition using no...

G10L 15/32   Multiple recognisers used i...

G10L 2015/228   of application context

Multimodal disambiguation of speech recognition

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multimodal disambiguation of speech recognition

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links