Geotagged and weighted environmental audio for enhanced speech recognition accuracy
First Claim
1. A system comprising:
- one or more computers; and
a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations,receiving an audio signal that corresponds to an utterance recorded by a particular mobile device,determining a particular geographic location associated with the particular mobile device,selecting a subset of geotagged audio signals, and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated,generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals, andperforming noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
2 Assignments
0 Petitions
Accused Products
Abstract
Enhancing noisy speech recognition accuracy by receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, selecting a subset of geotagged audio signals and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated, generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
204 Citations
20 Claims
-
1. A system comprising:
-
one or more computers; and a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, selecting a subset of geotagged audio signals, and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated, generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals, and performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations; receiving an audio signal that corresponds to an utterance recorded by a particular mobile device; determining a particular geographic location associated with the particular mobile device; selecting a subset of geotagged audio signals, and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated; generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals; and performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
-
-
20. A computer-implemented method comprising:
-
receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations; receiving an audio signal that corresponds to an utterance recorded by a particular mobile device; determining a particular geographic location associated with the particular mobile device; selecting a subset of geotagged audio signals, and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated; generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals; and performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
-
Specification