Multimodal disambiguation of speech recognition

US 8,606,582 B2
Filed: 10/12/2012
Issued: 12/10/2013
Est. Priority Date: 06/02/2004
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for processing language input in a system that includes a mobile computer, the mobile computer including a microphone and a display and a non-voice input device operable by a user, the method comprising operations of:

responsive to the mobile computing device receiving via the microphone voice input comprising multiple discrete utterances from a user;

the mobile computing device displaying an initial N-best list of words corresponding to each of the utterances recognized by speech recognition operations, the operation of displaying each initial N-best list of words further considering context of the corresponding utterance with respect to words of N-best lists corresponding to others of the received utterances;

for each of said utterances, the mobile computing device visually displaying a best word from the initial N-best list of words corresponding to said utterance;

responsive to implied or explicit user selection of one of the displayed best words, said selected word being from a given N-best list of words corresponding to a given utterance, the mobile computing device displaying additional words from the given initial N-best list of words;

during said presentation of the additional words, the mobile computing device receiving via the non-voice input device an input from a user, and responsive to said user input, said presentation of the additional words is constrained to exclude words of the given initial N-best list that are inconsistent with the non-voice input; and

responsive to said presentation of the additional words being constrained to a resultant word, displaying the resultant word instead of the selected word.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.

248 Citations

14 Claims

1. A computer-implemented method for processing language input in a system that includes a mobile computer, the mobile computer including a microphone and a display and a non-voice input device operable by a user, the method comprising operations of:
- responsive to the mobile computing device receiving via the microphone voice input comprising multiple discrete utterances from a user;
  
  the mobile computing device displaying an initial N-best list of words corresponding to each of the utterances recognized by speech recognition operations, the operation of displaying each initial N-best list of words further considering context of the corresponding utterance with respect to words of N-best lists corresponding to others of the received utterances;
  
  for each of said utterances, the mobile computing device visually displaying a best word from the initial N-best list of words corresponding to said utterance;
  
  responsive to implied or explicit user selection of one of the displayed best words, said selected word being from a given N-best list of words corresponding to a given utterance, the mobile computing device displaying additional words from the given initial N-best list of words;
  
  during said presentation of the additional words, the mobile computing device receiving via the non-voice input device an input from a user, and responsive to said user input, said presentation of the additional words is constrained to exclude words of the given initial N-best list that are inconsistent with the non-voice input; and
  
  responsive to said presentation of the additional words being constrained to a resultant word, displaying the resultant word instead of the selected word.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein said context includes subject-verb agreement, proper case, proper gender, and numerical agreement;
    - said method further comprising operations of updating the initial N-best lists of others of the utterances besides the given utterance to provide subject-verb agreement, employ proper case, use proper gender, and exhibit numerical agreement when considered in context of the resultant word.
  - 3. The method of claim 1, said method further comprising operations of:
    - for each of the utterances having an updated N-best list, the mobile computing device displaying a best word of the updated N-best list of words for that utterance; and
      
      for each of the utterances without an updated N-best list, the mobile computing device displaying a best word of the initial N-best list of words for said utterance.
  - 4. The method of claim 1, said speech recognition operations further comprising operations of:
    - converting the voice input into a digital sequence of vectors; and
      
      matching the vectors to potential phonemes and matching the phonemes against a lexicon model and a language model.
  - 5. The method of claim 1, further comprising operations of:
    - interpreting the non-voice input as user entry of a new word for entry immediately after the selected word; and
      
      responsive to the user completing entry of the new word before the presentation of additional words is constrained to a resultant word, causing the display to present the new word following the selected word.
  - 6. The method of claim 1, further comprising operations of:
    - responsive to the non-voice input starting with a letter or letters that conflict with all of the additional words, expanding the additional words to include words that phonetically resemble the best word of the initial N-best list but begin with said starting letter or letters.
  - 7. The method of claim 1, where the operation of constraining said presentation of the additional words to exclude words of the given initial N-best list that are inconsistent with the non-voice input comprises:
    - excluding words of the given initial N-best list that are not partially or completely spelled-out by the non-voice input.

8. A system for processing language input, comprising:
- a mobile computer including a microphone and a display and a non-voice input device operable by a user;
  
  wherein the mobile computer is programmed to perform computer-implemented operations comprising;
  
  responsive to the mobile computing device receiving via the microphone voice input comprising multiple discrete utterances from a user, displaying an initial N-best list of words corresponding to each of the utterances recognized by speech recognition operations, the operation of displaying each initial N-best list of words further considering context of the corresponding utterance with respect to words of N-best lists corresponding to others of the received utterances;
  
  for each of said utterances, the mobile computing device visually displaying a best word from the initial N-best list of words corresponding to said utterance;
  
  responsive to implied or explicit user selection of one of the displayed best words, said selected word being from a given N-best list of words corresponding to a given utterance, displaying additional words from the given initial N-best list of words;
  
  during said presentation of the additional words, the mobile computing device receiving via the non-voice input device an input from a user, and responsive to said user input, said presentation of the additional words is constrained to exclude words of the given initial N-best list that are inconsistent with the non-voice input;
  
  responsive to said presentation of the additional words being constrained to a resultant word, displaying the resultant word instead of the selected word.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein said context includes subject-verb agreement, proper case, proper gender, and numerical agreement;
    - andwherein the system is further programmed to perform computer-implemented operations comprising updating the initial N-best lists of others of the utterances besides the given utterance to provide subject-verb agreement, employ proper case, use proper gender, and exhibit numerical agreement when considered in context of the resultant word.
  - 10. The system of claim 8, wherein:
    - for each of the utterances having an updated N-best list, the mobile computing device displaying a best word of the updated N-best list of words for that utterance; and
      
      for each of the utterances without an updated N-best list, the mobile computing device displaying a best word of the initial N-best list of words for said utterance.
  - 11. The system of claim 8, wherein the system is further programmed to perform computer-implemented operations comprising:
    - converting the voice input into a digital sequence of vectors; and
      
      wherein said speech recognition operations include matching the vectors to potential phonemes and matching the phonemes against a lexicon model and a language model.
  - 12. The system of claim 8, wherein the system is further programmed to perform computer-implemented operations that further comprise:
    - interpreting the non-voice input as user entry of a new word for entry immediately after the selected word; and
      
      responsive to the user completing entry of the new word before the presentation of additional words is constrained to a resultant word, causing the display to present the new word following the selected word.
  - 13. The system of claim 8, wherein the system is further programmed to perform computer-implemented operations that further comprise:
    - responsive to the non-voice input starting with a letter or letters that conflict with all of the additional words, expanding the additional words to include words that phonetically resemble the best word of the initial N-best list but begin with said starting letter or letters.
  - 14. The system of claim 8, wherein the system is further programmed to perform a computer-implemented operation of constraining said presentation of the additional words to exclude words of the given initial N-best list that are inconsistent with the textual input that comprises:
    - excluding words of the given initial N-best list that are not partially or completely spelled-out by the non-voice input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Tegic Communications, Inc. (Microsoft Corporation)
Inventors
Longe, Michael, Eyraud, Richard, Hullfish, Keith C.
Primary Examiner(s)
Armstrong, Angela A

Application Number

US13/651,258
Publication Number

US 20130041667A1
Time in Patent Office

424 Days
Field of Search

704/235, 704/251, 704/257, 704/270
US Class Current

704/257
CPC Class Codes

G10L 15/18   using natural language mode...

G10L 15/24   Speech recognition using no...

G10L 15/32   Multiple recognisers used i...

Multimodal disambiguation of speech recognition

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

248 Citations

14 Claims

Specification

Use Cases

Quick Links

Others

Multimodal disambiguation of speech recognition

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

248 Citations

14 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others