Speech recognition using device docking context
First Claim
Patent Images
1. A computer-implemented method, comprising:
- accessing audio data that includes encoded speech;
accessing information that indicates a docking context of a client device, the docking context being associated with the audio data;
identifying a plurality of language models;
identifying multiple sets of weighting values for the plurality of language models, the multiple sets of weighting values comprising at leasta first set of multiple weighting values that correspond to multiple language models of the plurality of language models, the first set of multiple weighting values being associated with a first key phrase, wherein the first set of multiple weighting values is used to bias selection of a language model when a user utters the first key phrase, anda second set of multiple weighting values that correspond to multiple language models of the plurality of language models, the second set of multiple weighting values being associated with a second key phrase, the second set of multiple weighting values being different from the first set of multiple weighting values, and the second key phrase being different from the first key phrase;
determining that the docking context indicates docking of the client device with a docking station of a first type;
based on determining that the docking context indicates docking of the client device with the docking station of the first type, selecting, from among the multiple sets of weighting values, the first set of multiple weighting values associated with the first key phrase;
selecting at least a first language model of the plurality of language models using the first set of multiple weighting values associated with the first key phrase; and
performing speech recognition on the audio data using the first language model to identify a transcription for a portion of the audio data.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data.
-
Citations
28 Claims
-
1. A computer-implemented method, comprising:
-
accessing audio data that includes encoded speech; accessing information that indicates a docking context of a client device, the docking context being associated with the audio data; identifying a plurality of language models; identifying multiple sets of weighting values for the plurality of language models, the multiple sets of weighting values comprising at least a first set of multiple weighting values that correspond to multiple language models of the plurality of language models, the first set of multiple weighting values being associated with a first key phrase, wherein the first set of multiple weighting values is used to bias selection of a language model when a user utters the first key phrase, and a second set of multiple weighting values that correspond to multiple language models of the plurality of language models, the second set of multiple weighting values being associated with a second key phrase, the second set of multiple weighting values being different from the first set of multiple weighting values, and the second key phrase being different from the first key phrase; determining that the docking context indicates docking of the client device with a docking station of a first type; based on determining that the docking context indicates docking of the client device with the docking station of the first type, selecting, from among the multiple sets of weighting values, the first set of multiple weighting values associated with the first key phrase; selecting at least a first language model of the plurality of language models using the first set of multiple weighting values associated with the first key phrase; and performing speech recognition on the audio data using the first language model to identify a transcription for a portion of the audio data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
-
one or more processors; and a computer-readable medium coupled to the one or more processors having instructions stored thereon which, when executed by the one or more processors, cause the system to perform operations comprising; accessing audio data that includes encoded speech; accessing information that indicates a docking context of a client device, the docking context being associated with the audio data; identifying a plurality of language models; identifying multiple sets of weighting values for the plurality of language models, the multiple sets of weighting values comprising at least a first set of multiple weighting values that correspond to multiple language models of the plurality of language models, the first set of multiple weighting values being associated with a first key phrase, wherein the first set of multiple weighting values is used to bias selection of a language model when a user utters the first key phrase, and a second set of multiple weighting values that correspond to multiple language models of the plurality of language models, the second set of multiple weighting values being associated with a second key phrase, the second set of multiple weighting values being different from the first set of multiple weighting values, and the second key phrase being different from the first key phrase; determining that the docking context indicates docking of the client device with a docking station of a first type; based on determining that the docking context indicates docking of the client device with the docking station of the first type, selecting, from among the multiple sets of weighting values, the first set of multiple weighting values associated with the first key phrase; selecting at least a first language model of the plurality of language models using the first set of multiple weighting values associated with the first key phrase; and performing speech recognition on the audio data using the first language model to identify a transcription for a portion of the audio data. - View Dependent Claims (21, 22, 23)
-
-
24. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
accessing audio data that includes encoded speech; accessing information that indicates a docking context of a client device, the docking context being associated with the audio data; identifying a plurality of language models; identifying multiple sets of weighting values for the plurality of language models, the multiple sets of weighting values comprising at least a first set of multiple weighting values that correspond to multiple language models of the plurality of language models, the first set of multiple weighting values being associated with a first key phrase, wherein the first set of multiple weighting values is used to bias selection of a language model when a user utters the first key phrase, and a second set of multiple weighting values that correspond to multiple language models of the plurality of language models, the second set of multiple weighting values being associated with a second key phrase, the second set of multiple weighting values being different from the first set of multiple weighting values, and the second key phrase being different from the first key phrase; determining that the docking context indicates docking of the client device with a docking station of a first type; based on determining that the docking context indicates docking of the client device with the docking station of the first type, selecting, from among the multiple sets of weighting values, the first set of multiple weighting values associated with the first key phrase; selecting at least a first language model of the plurality of language models using the first set of multiple weighting values associated with the first key phrase; and performing speech recognition on the audio data using the first language model to identify a transcription for a portion of the audio data. - View Dependent Claims (25, 26, 27, 28)
-
Specification