Multiple recognizer speech recognition
First Claim
1. A computer-implemented method performed by a data processing apparatus, the method comprising:
- receiving audio data that corresponds to an utterance;
obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar;
obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar;
aligning the first and second transcriptions of the utterance to generate an aligned transcription; and
classifying the utterance, based at least on a portion of the aligned transcription, as a voice command or a voice query.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.
45 Citations
17 Claims
-
1. A computer-implemented method performed by a data processing apparatus, the method comprising:
-
receiving audio data that corresponds to an utterance; obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar; obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar; aligning the first and second transcriptions of the utterance to generate an aligned transcription; and classifying the utterance, based at least on a portion of the aligned transcription, as a voice command or a voice query. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a data processing apparatus; and a non-transitory memory storage storing instructions executable by the data processing apparatus and that upon such execution cause the data processing apparatus to perform operations comprising; receiving audio data that corresponds to an utterance; obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar; obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar; aligning the first and second transcriptions of the utterance to generate an aligned transcription; and classifying the utterance based at least on a portion of the aligned transcription, as a voice command or a voice query. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer readable medium storing instructions executable by a data processing apparatus and that upon such execution cause the data processing apparatus to perform operations comprising:
-
receiving audio data that corresponds to an utterance; obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar; obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar; aligning the first and second transcriptions of the utterance to generate an aligned transcription; and classifying the utterance based at least on a portion of the aligned transcription, as a voice command or a voice query. - View Dependent Claims (15, 16, 17)
-
Specification