Geotagged environmental audio for enhanced speech recognition accuracy

US 8,265,928 B2
Filed: 04/14/2010
Issued: 09/11/2012
Est. Priority Date: 04/14/2010
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more computers; and

a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;

receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations,receiving an audio signal that corresponds to an utterance recorded by a particular mobile device,determining a particular geographic location associated with the particular mobile device,selecting, as a subset of the geotagged audio signals, the geotagged audio signals that are associated with the particular geographic location, and that were received from two or more of the multiple mobile devices within a predetermined period of time relative to when the utterance was recorded by the mobile device,generating a noise model for the particular geographic location using the subset of the geotagged audio signals, andperforming noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, generating a noise model for the particular geographic location using a subset of the geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

Citations

21 Claims

1. A system comprising:
- one or more computers; and
  
  a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations,receiving an audio signal that corresponds to an utterance recorded by a particular mobile device,determining a particular geographic location associated with the particular mobile device,selecting, as a subset of the geotagged audio signals, the geotagged audio signals that are associated with the particular geographic location, and that were received from two or more of the multiple mobile devices within a predetermined period of time relative to when the utterance was recorded by the mobile device,generating a noise model for the particular geographic location using the subset of the geotagged audio signals, andperforming noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The system of claim 1, wherein the operations further comprise performing speech recognition on the utterance using the noise-compensated audio signal.
  - 3. The system of claim 1, wherein generating the noise model further comprises generating the noise model before receiving the audio signal that corresponds to the utterance.
  - 4. The system of claim 1, wherein generating the noise model further comprises generating the noise model after receiving the audio signal that corresponds to the utterance.
  - 5. The system of claim 1, wherein selecting, as the subset of the geotagged audio signals, the geotagged audio signals that are associated with the particular geographic location further comprises:
    - determining, for each of the geotagged audio signals, a distance between the particular geographic location and a geographic location associated with the geotagged audio signal; and
      
      selecting, as the subset of the geotagged audio signals, the geotagged audio signals that are associated with geographic locations which are within a predetermined distance of the particular geographic location, or that are associated with geographic locations which are among the N closest geographic locations to the particular geographic location.
  - 6. The system of claim 1, wherein selecting, as the subset of the geotagged audio signals, the geotagged audio signals that are associated with the particular geographic location further comprises selecting the subset of the geotagged audio signals based on the particular geographic location, and based on context data associated with the utterance.
  - 7. The system of claim 6, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the particular mobile device when the utterance was recorded, data that references settings of the mobile device, or data that references a type of the mobile device.
  - 8. The system of claim 1, wherein the utterance represents a voice search query, or an input to a digital dictation application or a dialog system.
  - 9. The system of claim 1, wherein determining the particular geographic location further comprises receiving data referencing the particular geographic location from the mobile device.
  - 10. The system of claim 1, wherein determining the particular geographic location further comprises determining a past geographic location or a default geographic location associated with the device.
  - 11. The system of claim 1, wherein generating the noise model comprises training a Gaussian Mixture Model (GMM) using the subset of the geotagged audio signals as a training set.
  - 12. The system of claim 1, wherein the operations further comprise:
    - generating one or more candidate transcriptions of the utterance; and
      
      executing a search query using the one or more candidate transcriptions.
  - 13. The system of claim 1, wherein the operations further comprise:
    - processing the received geotagged audio signals to exclude portions of the environmental audio that include voices of users of the multiple mobile devices.
  - 14. The system of claim 1, wherein the operations further comprise selecting the noise model generated for the particular geographic location from among multiple noise models generated for the multiple geographic locations.
  - 15. The system of claim 14, wherein:
    - the operations further comprise;
      
      defining an area surrounding the particular geographic location,selecting a plurality of noise models associated with geographic locations within the area from among the multiple noise models, andgenerating a weighted combination of the selected noise models; and
      
      the noise compensation is performed using the weighted combination of selected noise models.
  - 16. The system of claim 1, wherein generating the noise model further comprises generating the noise model for the particular geographic location using the subset of the geotagged audio signals and using an environmental audio portion of the audio signal that corresponds to the utterance.
  - 17. The system of claim 1, wherein selecting, as the subset of the geotagged audio signals, the geotagged audio signals that are associated with the particular geographic location further comprises:
    - defining an area surrounding the particular geographic location; and
      
      selecting, as the subset of the geotagged audio signals, the geotagged audio signals recorded within the area.

18. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- receiving an audio signal that corresponds to an utterance recorded by a particular mobile device;
  
  determining a particular geographic location associated with the particular mobile device;
  
  selecting from a set of geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, a subset of geotagged audio signals that are associated with the particular geographic location and that were received from two or more of the multiple mobile devices within a predetermined period of time relative to when the utterance was recorded by the mobile device; and
  
  performing noise compensation on the audio signal that corresponds to the utterance using the subset of the geotagged audio signals.
- View Dependent Claims (19)
- - 19. The computer storage medium of claim 18, wherein the program comprising the instructions that when executed by one or more computers cause the one or more computers to perform operations further comprising:
    - generating or modifying a noise model for the particular geographic location using the subset of the geotagged audio signals; and
      
      performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated or modified for the particular geographic location.

20. A computer-implemented method comprising:
- receiving an audio signal that corresponds to an utterance recorded by a particular mobile device;
  
  determining a particular geographic location associated with the particular mobile device;
  
  selecting from a set of geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, a subset of geotagged audio signals that are associated with the particular geographic location and that were received from two or more of the multiple mobile devices within a predetermined period of time relative to when the utterance was recorded by the mobile device; and
  
  performing noise compensation on the audio signal that corresponds to the utterance using the subset of the geotagged audio signals.
- View Dependent Claims (21)
- - 21. The computer-implemented method of claim 20, further comprising:
    - generating or modifying a noise model for the particular geographic location using the subset of the geotagged audio signals; and
      
      performing noise compensation on the audio signal that corresponds to the utterance using the noise model that has been generated or modified for the particular geographic location.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Kristjansson, Trausti, Lloyd, Matthew I.
Primary Examiner(s)
Smits, Talivaldis Ivars

Application Number

US12/760,147
Publication Number

US 20110257974A1
Time in Patent Office

881 Days
Field of Search

704/226, 704/227, 704/251
US Class Current

704/227
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 21/0208 Noise filtering

Geotagged environmental audio for enhanced speech recognition accuracy

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Geotagged environmental audio for enhanced speech recognition accuracy

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links