Geotagged and weighted environmental audio for enhanced speech recognition accuracy

US 8,175,872 B2
Filed: 09/30/2011
Issued: 05/08/2012
Est. Priority Date: 04/14/2010
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more computers; and

a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;

receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations,receiving an audio signal that corresponds to an utterance recorded by a particular mobile device,determining a particular geographic location associated with the particular mobile device,selecting a subset of geotagged audio signals, and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated,generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals, andperforming noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Enhancing noisy speech recognition accuracy by receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, selecting a subset of geotagged audio signals and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated, generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

204 Citations

20 Claims

1. A system comprising:
- one or more computers; and
  
  a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations,receiving an audio signal that corresponds to an utterance recorded by a particular mobile device,determining a particular geographic location associated with the particular mobile device,selecting a subset of geotagged audio signals, and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated,generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals, andperforming noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The system of claim 1, wherein the operations further comprise performing speech recognition on the utterance using the noise-compensated audio signal.
  - 3. The system of claim 1, wherein generating the noise model further comprises generating the noise model before receiving the audio signal that corresponds to the utterance.
  - 4. The system of claim 1, wherein generating the noise model further comprises generating the noise model after receiving the audio signal that corresponds to the utterance.
  - 5. The system of claim 1, wherein the operations further comprise:
    - determining, for each of the geotagged audio signals, a distance between the particular geographic location and a geographic location associated the geotagged audio signal; and
      
      selecting, as the subset of the geotagged audio signals, the geotagged audio signals that are associated with geographic locations which are within a predetermined distance of the particular geographic location, or that are associated with geographic locations which are among the N closest geographic locations to the particular geographic location.
  - 6. The system of claim 1, wherein the operations further comprise:
    - selecting, as the subset of the geotagged audio signals, the geotagged audio signals that are associated with the particular geographic location.
  - 7. The system of claim 6, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the particular mobile device when the utterance was recorded, data that references settings of the mobile device, or data that references a type of the mobile device.
  - 8. The system of claim 1, wherein the operations further comprise selecting the subset of the geotagged audio signals based on the particular geographic location, and based on context data associated with the utterance.
  - 9. The system of claim 1, wherein the utterance represents a voice search query, or an input to a digital dictation application or a dialog system.
  - 10. The system of claim 1, wherein determining the particular geographic location further comprises receiving data referencing the particular geographic location from the mobile device.
  - 11. The system of claim 1, wherein determining the particular geographic location further comprises determining a past geographic location or a default geographic location associated with the device.
  - 12. The system of claim 1, wherein generating the noise model comprises training a Gaussian Mixture Model (GMM) using the subset of weighted geotagged audio signals as a training set.
  - 13. The system of claim 1, wherein the operations further comprise:
    - generating one or more candidate transcriptions of the utterance; and
      
      executing a search query using the one or more candidate transcriptions.
  - 14. The system of claim 1, wherein the operations further comprise:
    - processing the received geotagged audio signals to exclude portions of the environmental audio that include voices of users of the multiple mobile devices.
  - 15. The system of claim 14, wherein:
    - the operations further comprise;
      
      defining an area surrounding the particular geographic location,selecting a plurality of noise models associated with geographic locations within the area from among the multiple noise models, andgenerating a weighted combination of the selected noise models; and
      
      the noise compensation is performed using the weighted combination of selected noise models.
  - 16. The system of claim 1, wherein the operations further comprise selecting the noise model generated for the particular geographic location from among multiple noise models generated for the multiple geographic locations.
  - 17. The system of claim 1, wherein generating the noise model further comprises generating the noise model for the particular geographic location using the subset of weighted geotagged audio signals and using an environmental audio portion of the audio signal that corresponds to the utterance.
  - 18. The system of claim 1, wherein the operations further comprise:
    - defining an area surrounding the particular geographic location; and
      
      selecting, as the subset of the geotagged audio signals, the geotagged audio signals recorded within the area.

19. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations;
  
  receiving an audio signal that corresponds to an utterance recorded by a particular mobile device;
  
  determining a particular geographic location associated with the particular mobile device;
  
  selecting a subset of geotagged audio signals, and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated;
  
  generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals; and
  
  performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

20. A computer-implemented method comprising:
- receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations;
  
  receiving an audio signal that corresponds to an utterance recorded by a particular mobile device;
  
  determining a particular geographic location associated with the particular mobile device;
  
  selecting a subset of geotagged audio signals, and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated;
  
  generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals; and
  
  performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Kristjansson, Trausti, Lloyd, Matthew I.
Primary Examiner(s)
Smits, Talivaldis Ivars

Application Number

US13/250,843
Publication Number

US 20120022870A1
Time in Patent Office

221 Days
Field of Search

None
US Class Current

704/227
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 21/0208 Noise filtering

Geotagged and weighted environmental audio for enhanced speech recognition accuracy

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

204 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Geotagged and weighted environmental audio for enhanced speech recognition accuracy

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

204 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links