MIXED MODEL SPEECH RECOGNITION
First Claim
1. A computer-implemented method comprising:
- accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances;
generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is based on user-specific data;
generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model independent of user-specific data;
determining that the second transcription of the utterances includes a term from a predefined set of one or more terms; and
based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance.
2 Assignments
0 Petitions
Accused Products
Abstract
In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances; generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is based on user-specific data; generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model independent of user-specific data; determining that the second transcription of the utterances includes a term from a predefined set of one or more terms; and based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances; generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is developed based on user-specific data; generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model developed independent of user-specific data; determining that the second transcription of the utterances includes a term from a predefined set of one or more terms; and based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances; determining a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is developed based on user-specific data; determining a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model developed independent of user-specific data; determining that the second transcription of the utterances includes a term from a predefined set of one or more terms; and based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification