Acoustic model adaptation using geographic information

US 8,468,012 B2
Filed: 05/26/2010
Issued: 06/18/2013
Est. Priority Date: 05/26/2010
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more computers; and

a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;

receiving an audio signal that corresponds to an utterance recorded by a mobile device,determining a geographic location associated with the mobile device,determining a geographic location type associated with the geographic location,selecting a subset of geotagged audio signals based on the geographic location type associated with the geographic location of the mobile device, and based on context data associated with the utterance, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the mobile device when the utterance was recorded, data that references settings of the mobile device or data that references a type of the mobile device;

adapting one or more acoustic models for the geographic location type, andperforming speech recognition on the audio signal using the one or more acoustic models that are adapted for the geographic location type.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location.

Citations

16 Claims

1. A system comprising:
- one or more computers; and
  
  a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  receiving an audio signal that corresponds to an utterance recorded by a mobile device,determining a geographic location associated with the mobile device,determining a geographic location type associated with the geographic location,selecting a subset of geotagged audio signals based on the geographic location type associated with the geographic location of the mobile device, and based on context data associated with the utterance, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the mobile device when the utterance was recorded, data that references settings of the mobile device or data that references a type of the mobile device;
  
  adapting one or more acoustic models for the geographic location type, andperforming speech recognition on the audio signal using the one or more acoustic models that are adapted for the geographic location type.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The system of claim 1, wherein adapting one or more acoustic models further comprises adapting one or more acoustic models before receiving the audio signal that corresponds to the utterance.
  - 3. The system of claim 1, wherein adapting one or more acoustic models further comprises adapting one or more acoustic models after receiving the audio signal that corresponds to the utterance.
  - 4. The system of claim 1, wherein adapting the acoustic model comprises training a Gaussian Mixture Model (GMM) using a subset of geotagged audio signals as a training set.
  - 5. The system of claim 1, wherein the utterance represents a voice search query, or an input to a digital dictation application or a dialog system.
  - 6. The system of claim 1, wherein determining the geographic location type further comprises receiving data referencing the geographic location type from the mobile device.
  - 7. The system of claim 1, wherein determining the geographic location type further comprises determining a past geographic location or a default geographic location associated with the mobile device.
  - 8. The system of claim 1, wherein the operations further comprise:
    - generating one or more candidate transcriptions of the utterance; and
      
      executing a search query using the one or more candidate transcriptions.
  - 9. The system of claim 1, wherein adapting one or more acoustic models for the geographic location type further comprises:
    - selecting, from among multiple acoustic models that have been generated for multiple geographic location types, the one or more acoustic models generated for the geographic location type associated with the geographic location of the mobile device.
  - 10. The system of claim 1, wherein adapting one or more acoustic models for the geographic location type further comprises:
    - incorporating data that references the geographic location type into a feature space used by a single acoustic model.
  - 11. The system of claim 10, wherein incorporating data that references the geographic location type into a feature space used by the single acoustic model further comprises incorporating values into a feature space used by the single acoustic model, wherein the values comprise Mel-frequency Cepstral Coefficients and geographic coordinates.
  - 12. The system of claim 1, wherein adapting one or more acoustic models for the geographic location type further comprises incorporating data that references the geographic location type into state information included in a single acoustic model.
  - 13. The system of claim 1, wherein adapting one or more acoustic models for the geographic location type further comprises:
    - deriving a transformation matrix associated with the geographic location type; and
      
      applying the transformation matrix to a single, universal acoustic model.
  - 14. The system of claim 1, wherein the geographic location type comprises a beach geographic location type, a restaurant geographic location type, a building geographic location type, a city geographic location type, a country geographic location type, a rural geographic location type, an urban geographic location type, a construction geographic location type, an amusement park geographic location type, a forest geographic location type, an indoor geographic location type, or an underground geographic location type.

15. A computer non-transitory storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- receiving an audio signal that corresponds to an utterance recorded by a mobile device;
  
  determining a geographic location associated with the mobile device;
  
  determining a geographic location type associated with the geographic location;
  
  after receiving the audio signal that corresponds to the utterance, selecting a subset of geotagged audio signals based on the geographic location type associated with the geographic location of the mobile device, and based on context data associated with the utterance, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the mobile device when the utterance was recorded, data that references settings of the mobile device or data that references a type of the mobile device;
  
  adapting one or more acoustic models for the geographic location type using the subset of geotagged audio signals selected after receiving the audio signal that corresponds to the utterance; and
  
  performing speech recognition on the audio signal using the one or more acoustic models that are adapted for the geographic location type.

16. A computer-implemented method comprising:
- receiving an audio signal that corresponds to an utterance recorded by a mobile device;
  
  receiving a data tar associated with the audio signal, wherein the data tar identifies an accent of a user of the mobile device;
  
  determining a geographic location associated with the mobile device;
  
  determining a geographic location type associated with the geographic location;
  
  selecting a subset of geotagged audio signals based on the geographic location type associated with the geographic location of the mobile device, and based on context data associated with the utterance, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the mobile device when the utterance was recorded, data that references settings of the mobile device or data that references a type of the mobile device;
  
  adapting one or more acoustic models for the geographic location type based on the accent of the user of the mobile device; and
  
  performing speech recognition on the audio signal using the one or more acoustic models that are adapted for the geographic location type.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lloyd, Matthew I., Kristjansson, Trausti
Primary Examiner(s)
SAINT CYR, LEONARD

Application Number

US12/787,568
Publication Number

US 20110295590A1
Time in Patent Office

1,119 Days
Field of Search

None
US Class Current

704/8
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

Acoustic model adaptation using geographic information

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Acoustic model adaptation using geographic information

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links