Word-level correction of speech input
First Claim
1. A computer-implemented method comprising:
- receiving speech audio data generated by a microphone of a computing device, wherein the speech audio data corresponds to an utterance received by the microphone;
providing a transcription of the utterance for output in an output region of a display of the computing device, wherein the transcription of the utterance is obtained from an automated speech recognizer operable to transcribe the speech audio data corresponding to the utterance;
receiving a user selection of a portion of the transcription of the utterance, the user-selected portion of the transcription of the utterance comprising one or more words;
in response to receiving the user selection of the portion of the transcription of the utterance, presenting one or more controls at the display of the computing device that each correspond to (i) one or more alternate words for the user-selected portion of the transcription of the utterance or (ii) a remove command to remove the user-selected portion of the transcription of the utterance from the transcription of the utterance;
receiving a user selection of a particular control from among the one or more controls; and
updating the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.
101 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving speech audio data generated by a microphone of a computing device, wherein the speech audio data corresponds to an utterance received by the microphone; providing a transcription of the utterance for output in an output region of a display of the computing device, wherein the transcription of the utterance is obtained from an automated speech recognizer operable to transcribe the speech audio data corresponding to the utterance; receiving a user selection of a portion of the transcription of the utterance, the user-selected portion of the transcription of the utterance comprising one or more words; in response to receiving the user selection of the portion of the transcription of the utterance, presenting one or more controls at the display of the computing device that each correspond to (i) one or more alternate words for the user-selected portion of the transcription of the utterance or (ii) a remove command to remove the user-selected portion of the transcription of the utterance from the transcription of the utterance; receiving a user selection of a particular control from among the one or more controls; and updating the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for correcting words in transcribed text, the system comprising:
-
an automated speech recognizer operable to receive speech audio data and in response transcribe the speech audio data in a word lattice; and a computing device comprising; a microphone operable to receive speech audio and generate the speech audio data, a network interface operable to send the speech audio data to the automated speech recognizer and in response receive the word lattice from the automated speech recognizer, a display screen operable to present one or more transcribed words from the word lattice, a user interface operable to receive a user selection of at least one of the transcribed words, and one or more processors and a memory storing instructions that when executed by the processors cause the computing device to perform operations to; provide a transcription of an utterance for output in an output region of a display of a computing device; receive a user selection of a portion of the transcription of the utterance, the user-selected portion of the transcription of the utterance comprising one or more words; in response to receiving the user selection of the portion of the transcription of the utterance, present one or more controls at the display of the computing device that each correspond to (i) one or more alternate words for the user-selected portion of the transcription of the utterance or (ii) a remove command to remove the user-selected portion of the transcription of the utterance from the transcription of the utterance; receive a user selection of a particular control from among the one or more controls; and update the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product, encoded on a non-transitory computer-readable medium, operable to cause one or more processors to perform operations for correcting words in transcribed text, the operations comprising:
-
receiving speech audio data generated by a microphone of a computing device, wherein the speech audio data corresponds to an utterance received by the microphone; providing a transcription of the utterance for output in an output region of a display of the computing device, wherein the transcription of the utterance is obtained from an automated speech recognizer operable to transcribe the speech audio data corresponding to the utterance; receiving a user selection of a portion of the transcription of the utterance, the user-selected portion of the transcription of the utterance comprising one or more words; in response to receiving the user selection of the portion of the transcription of the utterance, presenting one or more controls at the display of the computing device that each correspond to (i) one or more alternate words for the user-selected portion of the transcription of the utterance or (ii) a remove command to remove the user-selected portion of the transcription of the utterance from the transcription of the utterance; receiving a user selection of a particular control from among the one or more controls; and updating the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification