Geotagged environmental audio for enhanced speech recognition accuracy
First Claim
1. A system comprising:
- one or more computers; and
a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations,receiving an audio signal that corresponds to an utterance recorded by a particular mobile device,determining a particular geographic location associated with the particular mobile device,selecting, as a subset of the geotagged audio signals, the geotagged audio signals that are associated with the particular geographic location, and that were received from two or more of the multiple mobile devices within a predetermined period of time relative to when the utterance was recorded by the mobile device,generating a noise model for the particular geographic location using the subset of the geotagged audio signals, andperforming noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, generating a noise model for the particular geographic location using a subset of the geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
-
Citations
21 Claims
-
1. A system comprising:
-
one or more computers; and a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, selecting, as a subset of the geotagged audio signals, the geotagged audio signals that are associated with the particular geographic location, and that were received from two or more of the multiple mobile devices within a predetermined period of time relative to when the utterance was recorded by the mobile device, generating a noise model for the particular geographic location using the subset of the geotagged audio signals, and performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving an audio signal that corresponds to an utterance recorded by a particular mobile device; determining a particular geographic location associated with the particular mobile device; selecting from a set of geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, a subset of geotagged audio signals that are associated with the particular geographic location and that were received from two or more of the multiple mobile devices within a predetermined period of time relative to when the utterance was recorded by the mobile device; and performing noise compensation on the audio signal that corresponds to the utterance using the subset of the geotagged audio signals. - View Dependent Claims (19)
-
-
20. A computer-implemented method comprising:
-
receiving an audio signal that corresponds to an utterance recorded by a particular mobile device; determining a particular geographic location associated with the particular mobile device; selecting from a set of geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, a subset of geotagged audio signals that are associated with the particular geographic location and that were received from two or more of the multiple mobile devices within a predetermined period of time relative to when the utterance was recorded by the mobile device; and performing noise compensation on the audio signal that corresponds to the utterance using the subset of the geotagged audio signals. - View Dependent Claims (21)
-
Specification