GEOTAGGED ENVIRONMENTAL AUDIO FOR ENHANCED SPEECH RECOGNITION ACCURACY
First Claim
1. A system comprising:
- one or more computers; and
a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations,receiving an audio signal that corresponds to an utterance recorded by a particular mobile device,determining a particular geographic location associated with the particular mobile device,generating a noise model for the particular geographic location using a subset of the geotagged audio signals, andperforming noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, generating a noise model for the particular geographic location using a subset of the geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
432 Citations
20 Claims
-
1. A system comprising:
-
one or more computers; and a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, generating a noise model for the particular geographic location using a subset of the geotagged audio signals, and performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations; receiving an audio signal that corresponds to an utterance recorded by a particular mobile device; determining a particular geographic location associated with the particular mobile device; generating a noise model for the particular geographic location using a subset of the geotagged audio signals; and performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
-
-
20. A computer-implemented method comprising:
-
receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations; receiving an audio signal that corresponds to an utterance recorded by a particular mobile device; determining a particular geographic location associated with the particular mobile device; generating a noise model for the particular geographic location using a subset of the geotagged audio signals; and performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
-
Specification