Speech recognition using dock context
First Claim
Patent Images
1. A computer-implemented method, comprising:
- receiving, at a server system, audio data that includes encoded speech, the encoded speech having been detected by a client device;
receiving, at the server system, information that indicates a docking context of the client device while the speech encoded in the audio data was detected by the client device;
identifying a plurality of language models, each of the plurality of language models indicating a probability of an occurrence of a term in a sequence of terms based on other terms in the sequence;
for each of the plurality of language models, determining a weighting value to assign to the language model based on the docking context by accessing a stored weighting value associated with the docking context, the weighting value indicating a probability that using the language model will generate a correct transcription of the encoded speech;
selecting at least one of the plurality of language models based on the assigned weighting values; and
performing speech recognition on the audio data using the selected language model to identify a transcription for a portion of the audio data.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data.
169 Citations
29 Claims
-
1. A computer-implemented method, comprising:
-
receiving, at a server system, audio data that includes encoded speech, the encoded speech having been detected by a client device; receiving, at the server system, information that indicates a docking context of the client device while the speech encoded in the audio data was detected by the client device; identifying a plurality of language models, each of the plurality of language models indicating a probability of an occurrence of a term in a sequence of terms based on other terms in the sequence; for each of the plurality of language models, determining a weighting value to assign to the language model based on the docking context by accessing a stored weighting value associated with the docking context, the weighting value indicating a probability that using the language model will generate a correct transcription of the encoded speech; selecting at least one of the plurality of language models based on the assigned weighting values; and performing speech recognition on the audio data using the selected language model to identify a transcription for a portion of the audio data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method, comprising:
-
accessing audio data that includes encoded speech; accessing information that indicates a docking context of a client device, the docking context being associated with the audio data; identifying a plurality of language models; determining, for each of the plurality of language models, a weighting value based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech; selecting at least one of the plurality of language models based on the weighting values; and performing speech recognition on the audio data using the selected at least one language model to identify a transcription for a portion of the audio data. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
one or more processors; and a computer-readable medium coupled to the one or more processors having instructions stored thereon which, when executed by the one or more processors, cause the system to perform operations comprising; accessing audio data that includes encoded speech; accessing information that indicates a docking context of a client device, the docking context being associated with the audio data; identifying a plurality of language models; determining, for each of the plurality of language models, a weighting value based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech; selecting at least one of the plurality of language models based on the weighting values; and performing speech recognition on the audio data using the selected at least one language model to identify a transcription for a portion of the audio data. - View Dependent Claims (20, 21)
-
-
22. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
accessing audio data that includes encoded speech; accessing information that indicates a docking context of a client device, the docking context being associated with the audio data; identifying a plurality of language models; determining, for each of the plurality of language models, a weighting value based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech; selecting at least one of the plurality of language models based on the weighting values; and performing speech recognition on the audio data using the selected at least one language model to identify a transcription for a portion of the audio data. - View Dependent Claims (23, 24, 25)
-
-
26. A computer-implemented method comprising:
-
detecting audio containing speech at a client device; encoding the detected audio as audio data; transmitting the audio data to a server system; identifying a docking context of the client device; transmitting information indicating the docking context to the server system; and receiving a transcription of at least a portion of the audio data at the client device, the server system having determined, for each of a plurality of language models, a weighting value based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech, selected at least one of the plurality of language models based on the weighting values, and generated the transcription by performing speech recognition on the audio data using the selected at least one language model, and transmitted the transcription to the client device. - View Dependent Claims (27, 28, 29)
-
Specification