Recognizing different versions of a language
First Claim
1. A computer-implemented method comprising:
- receiving audio data that encodes an utterance;
providing the audio data to multiple speech recognizers that are each trained on a different dialect or accent of a same language;
receiving, from each of the multiple speech recognizers that are each trained on a different dialect or accent of a same language, (i) a transcription of the utterance, and (ii) a speech recognition confidence score;
selecting, from among the transcriptions of the utterance that are received from the multiple speech recognizers, one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers;
selecting, from among the one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers, a representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions; and
providing the representative transcription for output.
2 Assignments
0 Petitions
Accused Products
Abstract
Speech recognition systems may perform the following operations: receiving audio at a computing device; identifying a language associated with the audio; recognizing the audio using recognition models for different versions of the language to produce recognition candidates for the audio, where the recognition candidates are associated with corresponding information; comparing the information of the recognition candidates to identify agreement between at least two of the recognition models; selecting a recognition candidate based on information of the recognition candidate and agreement between the at least two of the recognition models; and outputting data corresponding to the selected recognition candidate as a recognized version of the audio.
61 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving audio data that encodes an utterance; providing the audio data to multiple speech recognizers that are each trained on a different dialect or accent of a same language; receiving, from each of the multiple speech recognizers that are each trained on a different dialect or accent of a same language, (i) a transcription of the utterance, and (ii) a speech recognition confidence score; selecting, from among the transcriptions of the utterance that are received from the multiple speech recognizers, one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers; selecting, from among the one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers, a representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions; and providing the representative transcription for output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more non-transitory machine-readable media storing instructions that are executable to perform operations comprising:
-
receiving audio data that encodes an utterance; providing the audio data to multiple speech recognizers that are each trained on a different dialect or accent of a same language; receiving, from each of the multiple speech recognizers that are each trained on a different dialect or accent of a same language, (i) a transcription of the utterance, and (ii) a speech recognition confidence score; selecting, from among the transcriptions of the utterance that are received from the multiple speech recognizers, one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers; selecting, from among the one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers, a representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions; and providing the representative transcription for output. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
one or more processing apparatus; a non-transitory computer-readable storage device having stored thereon instructions that, when executed by the one or more data processing apparatus, cause the one or more data processing apparatus to performing operations comprising; receiving audio data that encodes an utterance; providing the audio data to multiple speech recognizers that are each trained on a different dialect or accent of a same language; receiving, from each of the multiple speech recognizers that are each trained on a different dialect or accent of a same language, (i) a transcription of the utterance, and (ii) a speech recognition confidence score; selecting, from among the transcriptions of the utterance that are received from the multiple speech recognizers, one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers; selecting, from among the one or more particular transcriptions that were each generated by two or more of the multiple speech recognizers, a representative transcription based at least on the speech recognition confidence scores associated with the particular transcriptions; and providing the representative transcription for output. - View Dependent Claims (20)
-
Specification