Multimodal disambiguation of speech recognition
First Claim
1. A computer-implemented method for processing a user'"'"'s speech using a mobile computer that includes a microphone, a display, and a reduced-character keypad, the method comprising operations of:
- the computer receiving user speech via the microphone, the speech comprising a series of spoken words;
the computer performing speech recognition upon the speech to compute an original N-best list of words for each discrete utterance of the speech;
the computer operating the display to present a proposed sequence of multiple words, each word comprising;
for each given one of the discrete utterances, a best word of the N-best list for said discrete utterance;
the computer receiving and processing user entered correction to at least a given one of the displayed best words of the proposed sequence of words, comprising operations of;
in response to the computer receiving user selection of the given word from the proposed sequence of words, the computer presenting a list of alternate hypothesis including others of the N-best list of words for the selected word;
the computer receiving user input from the keypad spelling a desired word, where said user input is inherently ambiguous because the keypad includes multiple letters on some or all keys;
responsive to receiving the user input, preparing a revised N-best list by limiting entries of the N-best list of words to words that are spelled by the user input from the keyboard;
where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with any words of the proposed sequence of words that the user has previously accepted or corrected;
receiving user choice of a word from the revised N-best list in correction of the selected word;
the computer updating the proposed sequence of words to incorporate the user entered correction; and
the computer operating the display to present the updated proposed sequence of words.
11 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.
779 Citations
32 Claims
-
1. A computer-implemented method for processing a user'"'"'s speech using a mobile computer that includes a microphone, a display, and a reduced-character keypad, the method comprising operations of:
-
the computer receiving user speech via the microphone, the speech comprising a series of spoken words; the computer performing speech recognition upon the speech to compute an original N-best list of words for each discrete utterance of the speech; the computer operating the display to present a proposed sequence of multiple words, each word comprising;
for each given one of the discrete utterances, a best word of the N-best list for said discrete utterance;the computer receiving and processing user entered correction to at least a given one of the displayed best words of the proposed sequence of words, comprising operations of; in response to the computer receiving user selection of the given word from the proposed sequence of words, the computer presenting a list of alternate hypothesis including others of the N-best list of words for the selected word; the computer receiving user input from the keypad spelling a desired word, where said user input is inherently ambiguous because the keypad includes multiple letters on some or all keys; responsive to receiving the user input, preparing a revised N-best list by limiting entries of the N-best list of words to words that are spelled by the user input from the keyboard; where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with any words of the proposed sequence of words that the user has previously accepted or corrected; receiving user choice of a word from the revised N-best list in correction of the selected word; the computer updating the proposed sequence of words to incorporate the user entered correction; and the computer operating the display to present the updated proposed sequence of words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable medium encoded with a program of machine-readable instructions executable by a mobile computer to perform operations to process a user'"'"'s speech, where the mobile computing system includes a microphone, a display, and a reduced-character keypad, the operations comprising:
-
a computer receiving user speech via the microphone, the speech comprising a series of spoken words; the computer performing speech recognition upon the speech to compute an original N-best list of words for each discrete utterance of the speech; the computer operating the display to present a proposed sequence of multiple words, each word comprising;
for each given one of the discrete utterances, a best word of the N-best list for said discrete utterance;the computer receiving and processing user entered correction to at least a given one of the displayed best words of the proposed sequence of words, comprising operations of; in response to the computer receiving user identification of the given word from the proposed sequence of words, the computer presenting a list of alternate hypothesis including others of the N-best list of words for the selected word; the computer receiving user input from the keypad spelling a desired word, where said user input is inherently ambiguous because the keypad includes multiple letters on some or all keys; responsive to receiving the user input, preparing a revised N-best list by limiting entries of the N-best list of words to words that are spelled by the user input from the keyboard; where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with any words of the proposed sequence of words that the user has previously accepted or corrected; receiving user choice of a word from the revised N-best list in correction of the selected word; the computer updating the proposed sequence of words to incorporate the user entered correction; and the computer operating the display to present the updated proposed sequence of words. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer-driven apparatus for processing a user'"'"'s speech, comprising:
-
a microphone; a display; a reduced-character keypad; coupled to the microphone, the display, and the keypad, a processor programmed to perform operations comprising; a computer receiving user speech via the microphone, the speech comprising a series of spoken words; the computer performing speech recognition upon the speech to compute an original N-best list of words for each discrete utterance of the speech; the computer operating the display to present a proposed sequence of multiple words, each word comprising;
for each given one of the discrete utterances, a best word of the N-best list for said discrete utterance;the computer receiving and processing user entered correction to at least a given one of the displayed best words of the proposed sequence of words, comprising operations of; in response to the computer receiving user selection of the given word from the proposed sequence of words, the computer presenting a list of alternate hypothesis including others of the N-best list of words for the selected word; the computer receiving user input from the keypad spelling a desired word, where said user input is inherently ambiguous because the keypad includes multiple letters on some or all keys; responsive to receiving the user input, preparing a revised N-best list by limiting entries of the N-best list of words to words that are spelled by the user input from the keyboard; where the revised N-best list is further computed considering context and grammar of the selected word in conjunction with any words of the proposed sequence of words that the user has previously accepted or corrected; receiving user choice of a word from the revised N-best list in correction of the selected word; the computer updating the proposed sequence of words to incorporate the user entered correction; and the computer operating the display to present the updated proposed sequence of words. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
Specification