Word-level correction of speech input
First Claim
1. A computer-implemented method comprising:
- receiving, by a client device, audio data corresponding to a user utterance;
providing, by the client device, the audio data to a server-based, automated speech recognizer;
receiving, by the client device, multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;
providing, for output by the client device, a user interface that includes a representation of a first transcription of the user utterance that is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;
receiving, by the client device, data indicating a selection, through the user interface, of at least a portion of the representation of the first transcription of the user utterance, wherein the selection identifies the at least the portion of the representation of the first transcription of the user utterance as including at least one incorrect word; and
in response to receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance, replacing, by the client device and without (i) presenting an option to select one or more alternate words for the at least one incorrect word or (ii) initiating additional communication with the server-based automated speech recognizer, the representation of the first transcription of the user utterance in the user interface with a representation of a second transcription of the user utterance that (i) is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer, and (ii) includes one or more alternate words substituted for the at least one incorrect word.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.
146 Citations
19 Claims
-
1. A computer-implemented method comprising:
-
receiving, by a client device, audio data corresponding to a user utterance; providing, by the client device, the audio data to a server-based, automated speech recognizer; receiving, by the client device, multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer; providing, for output by the client device, a user interface that includes a representation of a first transcription of the user utterance that is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer; receiving, by the client device, data indicating a selection, through the user interface, of at least a portion of the representation of the first transcription of the user utterance, wherein the selection identifies the at least the portion of the representation of the first transcription of the user utterance as including at least one incorrect word; and in response to receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance, replacing, by the client device and without (i) presenting an option to select one or more alternate words for the at least one incorrect word or (ii) initiating additional communication with the server-based automated speech recognizer, the representation of the first transcription of the user utterance in the user interface with a representation of a second transcription of the user utterance that (i) is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer, and (ii) includes one or more alternate words substituted for the at least one incorrect word. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product, encoded in a non-transitory computer-readable medium, operable to cause one or more processors to perform operations for correcting words in transcribed text, the operations comprising:
-
receiving, by a client device, audio data corresponding to a user utterance; providing, by the client device, the audio data to a server-based, automated speech recognizer; receiving, by the client device, multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer; providing, for output by the client device, a user interface that includes a representation of a first transcription of the user utterance that is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer; receiving, by the client device, data indicating a selection, through the user interface, of at least a portion of the representation of the first transcription of the user utterance, wherein the selection identifies the at least the portion of the representation of the first transcription of the user utterance as including at least one incorrect word; and in response to receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance, replacing, by the client device and without (i) presenting an option to select one or more alternate words for the at least one incorrect word or (ii) initiating additional communication with the server-based automated speech recognizer, the representation of the first transcription of the user utterance in the user interface with a representation of a second transcription of the user utterance that (i) is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer, and (ii) includes one or more alternate words substituted for the at least one incorrect word. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:
- receiving, by a client device, audio data corresponding to a user utterance;
providing, by the client device, the audio data to a server-based, automated speech recognizer; receiving, by the client device, multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer; providing, for output by the client device, a user interface that includes a representation of a first transcription of the user utterance that is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer; receiving, by the client device, data indicating a selection, through the user interface, of at least a portion of the representation of the first transcription of the user utterance, wherein the selection identifies the at least the portion of the representation of the first transcription of the user utterance as including at least one incorrect word; and in response to receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance, replacing, by the client device and without (i) presenting an option to select one or more alternate words for the at least one incorrect word or (ii) initiating additional communication with the server-based automated speech recognizer, the representation of the first transcription of the user utterance in the user interface with a representation of a second transcription of the user utterance that (i) is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer, and (ii) includes one or more alternate words substituted for the at least one incorrect word. - View Dependent Claims (15, 16, 17, 18, 19)
- receiving, by a client device, audio data corresponding to a user utterance;
Specification