Speech recognition models based on location indicia
First Claim
1. A computer-implemented method comprising:
- receiving, at a processing system, data corresponding to an utterance;
obtaining, at the processing system, location indicia for an area within a building where the utterance was spoken;
determining, at the processing system, a set of likelihoods based on the location indicia, each likelihood in the set corresponding to a likelihood that the utterance was spoken in a particular area of the building from a plurality of candidate areas of the building;
selecting, at the processing system, one or more candidate areas of the building from the plurality of candidate areas of the building based on the set of likelihoods;
accessing, for each selected candidate area of the building, a model for speech recognition associated with the respective candidate area of the building;
generating, at the processing system, a composite model using the accessed models for speech recognition and the likelihoods associated with the corresponding candidate areas of the building; and
generating, at the processing system, a transcription of the utterance using the composite model.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speech recognition using models that are based on where, within a building, a speaker makes an utterance are disclosed. The methods, systems, and apparatus include actions of receiving data corresponding to an utterance, and obtaining location indicia for an area within a building where the utterance was spoken. Further actions include selecting one or more models for speech recognition based on the location indicia, wherein each of the selected one or more models is associated with a weight based on the location indicia. Additionally, the actions include generating a composite model using the selected one or more models and the respective weights of the selected one or more models. And the actions also include generating a transcription of the utterance using the composite model.
-
Citations
28 Claims
-
1. A computer-implemented method comprising:
-
receiving, at a processing system, data corresponding to an utterance; obtaining, at the processing system, location indicia for an area within a building where the utterance was spoken; determining, at the processing system, a set of likelihoods based on the location indicia, each likelihood in the set corresponding to a likelihood that the utterance was spoken in a particular area of the building from a plurality of candidate areas of the building; selecting, at the processing system, one or more candidate areas of the building from the plurality of candidate areas of the building based on the set of likelihoods; accessing, for each selected candidate area of the building, a model for speech recognition associated with the respective candidate area of the building; generating, at the processing system, a composite model using the accessed models for speech recognition and the likelihoods associated with the corresponding candidate areas of the building; and generating, at the processing system, a transcription of the utterance using the composite model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving data corresponding to an utterance; obtaining location indicia for an area within a building where the utterance was spoken; determining a set of likelihoods based on the location indicia, each likelihood in the set corresponding to a likelihood that the utterance was spoken in a particular area of the building from a plurality of candidate areas of the building; selecting one or more candidate areas of the building from the plurality of candidate areas of the building based on the set of likelihoods; accessing, for each selected candidate area of the building, a model for speech recognition associated with the respective candidate area of the building; generating a composite model using the accessed models for speech recognition and the likelihoods associated with the corresponding candidate areas of the building; and generating a transcription of the utterance using the composite model. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving data corresponding to an utterance; obtaining location indicia for an area within a building where the utterance was spoken; determining a set of likelihoods based on the location indicia, each likelihood in the set corresponding to a likelihood that the utterance was spoken in a particular area of the building from a plurality of candidate areas of the building; selecting one or more candidate areas of the building from the plurality of candidate areas of the building based on the set of likelihoods; accessing, for each selected candidate area of the building, a model for speech recognition associated with the respective candidate area of the building; generating a composite model using the accessed models for speech recognition and the likelihoods associated with the corresponding candidate areas of the building; and generating a transcription of the utterance using the composite model. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. A client device comprising:
-
one or more processors and one or more storage devices storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising; receiving an utterance at a client device; obtaining, at the client device, location indicia for an area within a building where the utterance was spoken; communicating, from the client device to a server, data corresponding to the utterance and the location indicia for the area within the building where the utterance was spoken; and receiving, at the client device, a transcription of the utterance, wherein the transcription of the utterance was generated using a composite model, and wherein the composite model was generated using one or more selected models associated with one or more predetermined areas of the building, each of the one or more selected models being selected based on a likelihood that the utterance was spoken in the corresponding predetermined area of the building. - View Dependent Claims (26, 27, 28)
-
Specification