MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION
First Claim
1. A computer implemented method comprising:
- receiving, by the mobile device, a voice input;
displaying, by the mobile device, at a text insertion point of a touch screen display, a most likely interpretation of word candidates of the voice input, the most likely interpretation resulting from a speech recognition process;
receiving, by the mobile device, on the touch screen display, a first non-voice input that selects said displayed most likely interpretation of said word candidates;
responsive to the first non-voice input, displaying for selection, by the mobile device, two or more word candidates on the touch screen display; and
receiving, by the mobile device, at said non-voice input field, a second non-voice input that selects an intended word candidate from among said two or more word candidates.
6 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.
-
Citations
18 Claims
-
1. A computer implemented method comprising:
-
receiving, by the mobile device, a voice input; displaying, by the mobile device, at a text insertion point of a touch screen display, a most likely interpretation of word candidates of the voice input, the most likely interpretation resulting from a speech recognition process; receiving, by the mobile device, on the touch screen display, a first non-voice input that selects said displayed most likely interpretation of said word candidates; responsive to the first non-voice input, displaying for selection, by the mobile device, two or more word candidates on the touch screen display; and receiving, by the mobile device, at said non-voice input field, a second non-voice input that selects an intended word candidate from among said two or more word candidates. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product, tangibly embodied in a non-transitory computer-readable storage medium, the computer program product including instructions operable to cause a data processing apparatus to:
-
receive a voice input; display, at a text insertion point of a touch screen display, a most likely interpretation of word candidates of the voice input, the most likely interpretation resulting from a speech recognition process; receive, on the touch screen display, a first non-voice input that selects said displayed most likely interpretation of said word candidates; responsive to the first non-voice input, display for selection two or more word candidates on the touch screen display; and receive, at said non-voice input field, a second non-voice input that selects an intended word candidate from among said two or more word candidates. - View Dependent Claims (10, 11, 12)
-
-
13. A mobile device including a processor configured to:
-
receive a voice input; display, at a text insertion point of a touch screen display, a most likely interpretation of word candidates of the voice input, the most likely interpretation resulting from a speech recognition process; receive, on the touch screen display, a first non-voice input that selects said displayed most likely interpretation of said word candidates; responsive to the first non-voice input, display for selection two or more word candidates on the touch screen display; and receive, at said non-voice input field, a second non-voice input that selects an intended word candidate from among said two or more word candidates. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification