Multimodal disambiguation of speech recognition

US 7,881,936 B2
Filed: 06/01/2005
Issued: 02/01/2011
Est. Priority Date: 12/04/1998
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented method for processing a user'"'"'s speech using a mobile computer that includes a microphone, a display, and a reduced-character keypad, the method comprising operations of:

the computer receiving user speech via the microphone, the speech comprising a series of spoken words;

the computer performing speech recognition upon the speech to compute an original N-best list of words for each discrete utterance of the speech;

the computer operating the display to present a proposed sequence of multiple words, each word comprising;

for each given one of the discrete utterances, a best word of the N-best list for said discrete utterance;

the computer receiving and processing user entered correction to at least a given one of the displayed best words of the proposed sequence of words, comprising operations of;

in response to the computer receiving user selection of the given word from the proposed sequence of words, the computer presenting a list of alternate hypothesis including others of the N-best list of words for the selected word;

the computer receiving user input from the keypad spelling a desired word, where said user input is inherently ambiguous because the keypad includes multiple letters on some or all keys;

responsive to receiving the user input, preparing a revised N-best list by limiting entries of the N-best list of words to words that are spelled by the user input from the keyboard;

where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with any words of the proposed sequence of words that the user has previously accepted or corrected;

receiving user choice of a word from the revised N-best list in correction of the selected word;

the computer updating the proposed sequence of words to incorporate the user entered correction; and

the computer operating the display to present the updated proposed sequence of words.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.

779 Citations

32 Claims

1. A computer-implemented method for processing a user'"'"'s speech using a mobile computer that includes a microphone, a display, and a reduced-character keypad, the method comprising operations of:
- the computer receiving user speech via the microphone, the speech comprising a series of spoken words;
  
  the computer performing speech recognition upon the speech to compute an original N-best list of words for each discrete utterance of the speech;
  
  the computer operating the display to present a proposed sequence of multiple words, each word comprising;
  
  for each given one of the discrete utterances, a best word of the N-best list for said discrete utterance;
  
  the computer receiving and processing user entered correction to at least a given one of the displayed best words of the proposed sequence of words, comprising operations of;
  
  in response to the computer receiving user selection of the given word from the proposed sequence of words, the computer presenting a list of alternate hypothesis including others of the N-best list of words for the selected word;
  
  the computer receiving user input from the keypad spelling a desired word, where said user input is inherently ambiguous because the keypad includes multiple letters on some or all keys;
  
  responsive to receiving the user input, preparing a revised N-best list by limiting entries of the N-best list of words to words that are spelled by the user input from the keyboard;
  
  where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with any words of the proposed sequence of words that the user has previously accepted or corrected;
  
  receiving user choice of a word from the revised N-best list in correction of the selected word;
  
  the computer updating the proposed sequence of words to incorporate the user entered correction; and
  
  the computer operating the display to present the updated proposed sequence of words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, the operations of the computer receiving and processing user entered correction further comprising:
    - based on the user entered correction to the given one of the displayed best words, reinterpreting other words in the proposed sequence including any of (1) reinterpreting a boundary between the words of the proposed sequence, and (2) reinterpreting multiple words in the proposed sequence as being one word.
  - 3. The method of claim 1, where operation of computing any of the original and revised N-best lists of words considers contextual aspects of user'"'"'s actions, including any of user location and time of day.
  - 4. The method of claim 1, where the operation of the computer performing speech recognition upon the speech utilizes context based upon any of:
    - a N-gram language model; and
      
      a language model of a speech recognition engine.
  - 5. The method of claim 1, further comprising:
    - after user selection of the given word, responsive to receiving user input from the keypad associated with punctuation or symbols, the computer computing and operating the display to present a revised-N-best list of words for the selected word limited to punctuation or one or more symbols.
  - 6. The method of claim 1, where the words comprise alphabetically formed words, and the keys of the keypad correspond to alphabetic letters.
  - 7. The method of claim 1, where the words comprise logographic characters formed by strokes, and the keys of the keypad correspond to said strokes or categories of said strokes.
  - 8. The method of claim 1, the preparing operation including refining the revised N-best list substantially in real time given each keypress of user input received.
  - 9. The method of claim 1, further comprising:
    - responsive to the user accepting or correcting one or more words of the proposed sequence of words, the computer automatically preparing a revised N-best list for one or more other words in the sequence based on context of said one or more words relative to the accepted or corrected one or more words.
  - 10. The method of claim 1, where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with words of the proposed sequence of words, said context and grammar including subject-verb agreement, case, gender, and number agreements.

11. A computer-readable medium encoded with a program of machine-readable instructions executable by a mobile computer to perform operations to process a user'"'"'s speech, where the mobile computing system includes a microphone, a display, and a reduced-character keypad, the operations comprising:
- a computer receiving user speech via the microphone, the speech comprising a series of spoken words;
  
  the computer performing speech recognition upon the speech to compute an original N-best list of words for each discrete utterance of the speech;
  
  the computer operating the display to present a proposed sequence of multiple words, each word comprising;
  
  for each given one of the discrete utterances, a best word of the N-best list for said discrete utterance;
  
  the computer receiving and processing user entered correction to at least a given one of the displayed best words of the proposed sequence of words, comprising operations of;
  
  in response to the computer receiving user identification of the given word from the proposed sequence of words, the computer presenting a list of alternate hypothesis including others of the N-best list of words for the selected word;
  
  the computer receiving user input from the keypad spelling a desired word, where said user input is inherently ambiguous because the keypad includes multiple letters on some or all keys;
  
  responsive to receiving the user input, preparing a revised N-best list by limiting entries of the N-best list of words to words that are spelled by the user input from the keyboard;
  
  where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with any words of the proposed sequence of words that the user has previously accepted or corrected;
  
  receiving user choice of a word from the revised N-best list in correction of the selected word;
  
  the computer updating the proposed sequence of words to incorporate the user entered correction; and
  
  the computer operating the display to present the updated proposed sequence of words.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 12. The medium of claim 11, the operations of the computer receiving and processing user entered correction further comprising:
    - based on the user entered correction to the given one of the displayed best words, reinterpreting other words in the proposed sequence including any of (1) reinterpreting a boundary between the words of the proposed sequence, and (2) reinterpreting multiple words in the proposed sequence as being one word.
  - 13. The medium of claim 11, where operation of computing any of the original and revised N-best lists of words considers contextual aspects of user'"'"'s actions, including any of user location and time of day.
  - 14. The medium of claim 11, where the operation of the computer performing speech recognition upon the speech utilizes context based upon any of:
    - a N-gram language model; and
      
      a language model of a speech recognition engine.
  - 15. The medium of claim 11, further comprising:
    - after user selection of the given word, responsive to receiving user input from the keypad associated with punctuation or symbols, the computer computing and operating the display to present a revised-N-best list of words for the selected word limited to punctuation or one or more symbols.
  - 16. The medium of claim 11, where the words comprise alphabetically formed words, and the keys of the keypad correspond to alphabetic letters.
  - 17. The medium of claim 11, where the words comprise logographic characters formed by strokes, and the keys of the keypad correspond to said strokes or categories of said strokes.
  - 18. The medium of claim 11, the preparing operation comprising refining the revised N-best list substantially in real time given each keypress of user input received.
  - 19. The medium of claim 11, where the operation of further computing the revised N-best list considering context and grammar comprises:
    - if the user has previously accepted or corrected multiple words of the proposed sequence of words, the revised N-best list is computed considering context and grammar of the selected word in relation to the multiple words.
  - 20. The medium of claim 11, further comprising:
    - responsive to the user accepting or correcting one or more words of the proposed sequence of words, the computer automatically preparing a revised N-best list for one or more other words in the sequence based on context of said one or more words relative to the accepted or corrected one or more words.
  - 21. The medium of claim 11, where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with words of the proposed sequence of words, said context and grammar including subject-verb agreement, case, gender, and number agreements.

22. A computer-driven apparatus for processing a user'"'"'s speech, comprising:
- a microphone;
  
  a display;
  
  a reduced-character keypad;
  
  coupled to the microphone, the display, and the keypad, a processor programmed to perform operations comprising;
  
  a computer receiving user speech via the microphone, the speech comprising a series of spoken words;
  
  the computer performing speech recognition upon the speech to compute an original N-best list of words for each discrete utterance of the speech;
  
  the computer operating the display to present a proposed sequence of multiple words, each word comprising;
  
  for each given one of the discrete utterances, a best word of the N-best list for said discrete utterance;
  
  the computer receiving and processing user entered correction to at least a given one of the displayed best words of the proposed sequence of words, comprising operations of;
  
  in response to the computer receiving user selection of the given word from the proposed sequence of words, the computer presenting a list of alternate hypothesis including others of the N-best list of words for the selected word;
  
  the computer receiving user input from the keypad spelling a desired word, where said user input is inherently ambiguous because the keypad includes multiple letters on some or all keys;
  
  responsive to receiving the user input, preparing a revised N-best list by limiting entries of the N-best list of words to words that are spelled by the user input from the keyboard;
  
  where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with any words of the proposed sequence of words that the user has previously accepted or corrected;
  
  receiving user choice of a word from the revised N-best list in correction of the selected word;
  
  the computer updating the proposed sequence of words to incorporate the user entered correction; and
  
  the computer operating the display to present the updated proposed sequence of words.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 23. The apparatus of claim 22, the operations of the computer receiving and processing user entered correction further comprising:
    - based on the user entered correction to the given one of the displayed best words, reinterpreting other words in the proposed sequence including any of (1) reinterpreting a boundary between the words of the proposed sequence, and (2) reinterpreting multiple words in the proposed sequence as being one word.
  - 24. The apparatus of claim 22, where operation of computing any of the original and revised N-best lists of words considers contextual aspects of user'"'"'s actions, including any of user location and time of day.
  - 25. The apparatus of claim 22, where the operation of the computer performing speech recognition upon the speech utilizes context based upon any of:
    - a N-gram language model; and
      
      a language model of a speech recognition engine.
  - 26. The apparatus of claim 22, further comprising:
    - after user selection of the given word, responsive to receiving user input from the keypad associated with punctuation or symbols, the computer computing and operating the display to present a revised-N-best list of words for the selected word limited to punctuation or one or more symbols.
  - 27. The apparatus of claim 22, where the words comprise alphabetically formed words, and the keys of the keypad correspond to alphabetic letters.
  - 28. The apparatus of claim 22, where the words comprise logographic characters formed by strokes, and the keys of the keypad correspond to said strokes or categories of said strokes.
  - 29. The apparatus of claim 22, the preparing operation comprising refining the revised N-best list substantially in real time given each keypress of user input received.
  - 30. The apparatus of claim 22, where the operation of further computing the revised N-best list considering context and grammar comprises:
    - if the user has previously accepted or corrected multiple words of the proposed sequence of words, the revised N-best list is computed considering context and grammar of the selected word in relation to the multiple words.
  - 31. The apparatus of claim 22, further comprising:
    - responsive to the user accepting or correcting one or more words of the proposed sequence of words, the computer automatically preparing a revised N-best list for one or more other words in the sequence based on context of said one or more words relative to the accepted or corrected one or more words.
  - 32. The apparatus of claim 22, where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with words of the proposed sequence of words, said context and grammar including subject-verb agreement, case, gender, and number agreements.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Tegic Communications, Inc. (Microsoft Corporation)
Inventors
Longé, Michael, Eyraud, Richard, Hullfish, Keith C.
Primary Examiner(s)
Armstrong; Angela A

Application Number

US11/143,409
Publication Number

US 20050283364A1
Time in Patent Office

2,071 Days
Field of Search

704/235, 704/251, 704/257, 704/270
US Class Current

704/257
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/268   Lexical context

G06V 30/36   Matching; Classification

G10L 15/24   Speech recognition using no...

G10L 15/32   Multiple recognisers used i...

G10L 2015/228   of application context

Multimodal disambiguation of speech recognition

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

779 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Multimodal disambiguation of speech recognition

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

779 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links