Recognizing speech in multiple languages
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving audio that encodes an utterance;
selecting, from among multiple automated speech recognizers that are each associated with a different natural language, a particular subset of the automated speech recognizers;
providing the audio to (i) a first automated speech recognizer of the particular subset that is associated with a first natural language, (ii) a second automated speech recognizer of the particular subset that is associated with a different, second, natural language, and (iii) an a automatic language identifier;
receiving (i) a first transcription of the utterance, in the first natural language, and a first speech recognition confidence score for the first transcription, from the first automated speech recognizer, (ii) a second transcription of the utterance, in the second natural language, and a second speech recognition confidence score for the second transcription, from the second automated speech recognizer, and (iii) data indicating a language associated with utterance, from the automated language identifier;
after receiving (i) the first transcription, (ii) the second transcription, and (iii) the data indicating the language associated with the utterance, selecting, from among the first transcription of the utterance and the second transcription of the utterance, a particular transcription to output as a representative transcription of the utterance based at least on (a) the first speech recognition confidence score, and (b) the second speech recognition confidence score, and (c) the data indicating the language; and
providing the representative transcription for output.
3 Assignments
0 Petitions
Accused Products
Abstract
Speech recognition systems may perform the following operations: receiving audio; recognizing the audio using language models for different languages to produce recognition candidates for the audio, where the recognition candidates are associated with corresponding recognition scores; identifying a candidate language for the audio; selecting a recognition candidate based on the recognition scores and the candidate language; and outputting data corresponding to the selected recognition candidate as a recognized version of the audio.
-
Citations
18 Claims
-
1. A computer-implemented method comprising:
-
receiving audio that encodes an utterance; selecting, from among multiple automated speech recognizers that are each associated with a different natural language, a particular subset of the automated speech recognizers; providing the audio to (i) a first automated speech recognizer of the particular subset that is associated with a first natural language, (ii) a second automated speech recognizer of the particular subset that is associated with a different, second, natural language, and (iii) an a automatic language identifier; receiving (i) a first transcription of the utterance, in the first natural language, and a first speech recognition confidence score for the first transcription, from the first automated speech recognizer, (ii) a second transcription of the utterance, in the second natural language, and a second speech recognition confidence score for the second transcription, from the second automated speech recognizer, and (iii) data indicating a language associated with utterance, from the automated language identifier; after receiving (i) the first transcription, (ii) the second transcription, and (iii) the data indicating the language associated with the utterance, selecting, from among the first transcription of the utterance and the second transcription of the utterance, a particular transcription to output as a representative transcription of the utterance based at least on (a) the first speech recognition confidence score, and (b) the second speech recognition confidence score, and (c) the data indicating the language; and providing the representative transcription for output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable storage device having instructions stored thereon that, when executed by a computing device, cause the computing device to perform operations comprising:
-
receiving audio that encodes an utterance; selecting, from among multiple automated speech recognizers that are each associated with a different natural language, a particular subset of the automated speech recognizers; providing the audio to (i) a first automated speech recognizer of the particular subset that is associated with a first natural language, (ii) a second automated speech recognizer of the particular subset that is associated with a different, second, natural language, and (iii) an a automatic language identifier; receiving (i) a first transcription of the utterance, in the first natural language, and a first speech recognition confidence score for the first transcription, from the first automated speech recognizer, (ii) a second transcription of the utterance, in the second natural language, and a second speech recognition confidence score for the second transcription, from the second automated speech recognizer, and (iii) data indicating a language associated with utterance, from the automated language identifier; after receiving (i) the first transcription, (ii) the second transcription, and (iii) the data indicating the language associated with the utterance, selecting, from among the first transcription of the utterance and the second transcription of the utterance, a particular transcription to output as a representative transcription of the utterance based at least on (a) the first speech recognition confidence score, and (b) the second speech recognition confidence score, and (c) the data indicating the language; and providing the representative transcription for output. - View Dependent Claims (10, 11, 12)
-
-
13. A system comprising:
-
one or more data processing apparatus; and a computer-readable storage device having stored thereon instructions that, when executed by the one or more data processing apparatus, cause the one or more data processing apparatus to perform operations comprising; receiving audio data that encodes an utterance; selecting, from among multiple automated speech recognizers that are each associated with a different natural language, a particular subset of the automated speech recognizers; providing the audio data to (i) a first automated speech recognizer of the particular subset that is associated with a first natural language, (ii) a second automated speech recognizer of the particular subset that is associated with a different, second, natural language, and (iii) an automated language identifier; receiving (i) a first transcription of the utterance, in the first natural language, from the first automated speech recognizer, (ii) a second transcription of the utterance, in the second natural language, from the second automated speech recognizer, and (iii) data indicating a first portion of the audio data that is associated with the first natural language and a second portion of the audio data that is associated with the second natural language, from the automated language identifier; after receiving (i) the first transcription, (ii) the second transcription, and (iii) the data indicating the first portion of the audio data that is associated with the first natural language and the second portion of the audio data that is associated with the second natural language, generating a representative transcription of the utterance comprising (a) a portion of the first transcription that corresponds to the first portion of the audio data that is associated with the first natural language, and (b) a portion of the second transcription that corresponds to the second portion of the audio data that is associated with the second natural language; and providing the representative transcription for output. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification