Speech and noise models for speech recognition

US 8,666,740 B2
Filed: 06/22/2012
Issued: 03/04/2014
Est. Priority Date: 06/14/2010
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processing devices; and

one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the system to;

receive an audio signal generated by a device based on audio input from a user, the audio signal including background audio and one or more user utterances recorded by the device, the audio signal comprising a portion that includes background audio without utterances of the user;

identify the user or the device based on an identifier for the user or the device;

determine a location of the user when the one or more utterances are recorded;

determine that a set of stored noise models does not include an adapted noise model that is adapted for the user or the device and that is associated with both (i) the determined location and (ii) the identifier for the user or the device;

select a surrogate noise model in response to determining that the set of stored noise models does not include an adapted noise model that is adapted for the user or the device and that is associated with both (i) the determined location and (ii) the identifier for the user or the device;

generate a filtered audio signal with reduced background audio compared to the received audio signal using the selected surrogate noise model;

adapt the surrogate noise model based on the received audio signal to generate a first adapted noise model that models characteristics of background audio surrounding the user at the location; and

store the first adapted noise model as one of a plurality of adapted noise models that are specific to the identified user or device, each of the plurality of adapted noise models being associated with a different corresponding location, each of the plurality of adapted noise models having been adapted based on audio recorded by the device at the corresponding location with which the adapted noise model is associated, each of the plurality of adapted noise models being stored in association with the identifier for the user or the device and with its corresponding location such that when utterances of the user are recorded by the device at the different corresponding locations, the adapted noise model associated with the location where the device recorded the utterances is used to recognize the utterances of the user.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

Citations

19 Claims

1. A system comprising:
- one or more processing devices; and
  
  one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the system to;
  
  receive an audio signal generated by a device based on audio input from a user, the audio signal including background audio and one or more user utterances recorded by the device, the audio signal comprising a portion that includes background audio without utterances of the user;
  
  identify the user or the device based on an identifier for the user or the device;
  
  determine a location of the user when the one or more utterances are recorded;
  
  determine that a set of stored noise models does not include an adapted noise model that is adapted for the user or the device and that is associated with both (i) the determined location and (ii) the identifier for the user or the device;
  
  select a surrogate noise model in response to determining that the set of stored noise models does not include an adapted noise model that is adapted for the user or the device and that is associated with both (i) the determined location and (ii) the identifier for the user or the device;
  
  generate a filtered audio signal with reduced background audio compared to the received audio signal using the selected surrogate noise model;
  
  adapt the surrogate noise model based on the received audio signal to generate a first adapted noise model that models characteristics of background audio surrounding the user at the location; and
  
  store the first adapted noise model as one of a plurality of adapted noise models that are specific to the identified user or device, each of the plurality of adapted noise models being associated with a different corresponding location, each of the plurality of adapted noise models having been adapted based on audio recorded by the device at the corresponding location with which the adapted noise model is associated, each of the plurality of adapted noise models being stored in association with the identifier for the user or the device and with its corresponding location such that when utterances of the user are recorded by the device at the different corresponding locations, the adapted noise model associated with the location where the device recorded the utterances is used to recognize the utterances of the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The system of claim 1, wherein:
    - to determine the location of the user when the one or more utterances are recorded, the instructions, when executed, cause the system to determine a geographical location of the user when the one or more utterances are recorded; and
      
      to store the first adapted noise model, the instructions, when executed, cause the system to store the first adapted noise model as a noise model for the user for audio input recorded at the geographical location.
  - 3. The system of claim 1, wherein:
    - to determine the location of the user when the one or more utterances are recorded, the instructions, when executed, cause the system to determine a type of location where the user is located when the one or more utterances are recorded; and
      
      to store the first adapted noise model, the instructions, when executed, cause the system to store the first adapted noise model as a noise model for the user for audio input recorded at locations of the same type as the determined type of location.
  - 4. The system of claim 1, wherein, to select the surrogate noise model, the instructions, when executed, cause the system to select the noise model based on the determined location.
  - 5. The system of claim 4, wherein, to select the surrogate noise model based on the determined location, the instructions, when executed, cause the system to select the surrogate noise model from among multiple noise models that are each developed to model characteristics of background audio at a different corresponding location based on audio input that is recorded at the corresponding location by other users.
  - 6. The system of claim 5, wherein the audio input that is recorded at the determined location by other users comprises voice queries submitted by the other users.
  - 7. The system of claim 1, wherein the instructions further comprise instructions that, when executed, cause the system to:
    - receive a second audio signal generated by the device based on second audio input from the user, the audio signal including at least second background audio and one or more second user utterances recorded by the device;
      
      determine that, when the one or more second utterances are recorded, the user is at the determined location where the one or more first utterances were recorded; and
      
      perform noise compensation on the second audio signal using the first adapted noise model in response to determining that, when the one or more second utterances are recorded, the user is at the determined location where the one or more first utterances were recorded.
  - 8. The system of claim 1, wherein the instructions further comprise instructions that, when executed, cause the system to:
    - receive a second audio signal generated by the device based on second audio input from the user, the audio signal including at least second background audio and one or more second user utterances recorded by the device;
      
      determine a second location of the user when the one or more second utterances are recorded, the second location being different from the location of the user when the one or more first utterances are recorded;
      
      select, from among the plurality of adapted noise models, a second noise model based on the second location, the second location being the corresponding location for second noise model, and the second noise model having been adapted based on audio recorded by the device at the second location; and
      
      perform noise compensation on the second audio signal using the second noise model.
  - 9. The system of claim 1, wherein the instructions further comprise instructions that, when executed, cause the system to:
    - access a user speech model associated with the user that models speech characteristics of the user; and
      
      perform noise compensation on the received audio signal using the user speech model and the selected noise model.
  - 10. The system of claim 9, wherein:
    - the instructions comprise instructions that, when executed, cause the system to determine that the background audio in the audio signal is above a defined threshold; and
      
      to adapt the selected noise model, the instructions, when executed, cause the system to adapt the selected noise model in response to determining that the background audio in the audio signal is above the defined threshold.
  - 11. The system of claim 1, wherein, to select the surrogate noise model, the instructions include instructions that, when executed by the one or more processing devices, cause the system to:
    - receive an initial audio signal that includes at least an initial user audio portion that corresponds to one or more user utterances recorded by the device;
      
      determine similarity metrics between multiple surrogate noise models and an expected noise model for the user determined based on the initial audio signal; and
      
      select the surrogate noise model, from among multiple surrogate noise models, based on the similarity metrics.
  - 12. The system of claim 11 wherein each of the multiple surrogate noise models model characteristics of background audio in a particular location.
  - 13. The system of claim 11, wherein each of the multiple surrogate noise models model characteristics of background audio in a particular kind of environmental condition.
  - 14. The system of claim 1, wherein, to select the surrogate noise model, the instructions include instructions that, when executed by the one or more processing devices, cause the system to:
    - receive context information about the device; and
      
      select the surrogate noise model, from among the multiple surrogate noise models, based on the context information.
  - 15. The system of claim 1, wherein, to store the first adapted noise model, the instructions, when executed, cause the system to index the first adapted noise model based on the identifier for the user or the device.
  - 16. The system of claim 1, wherein, to determine the location of the user when the one or more utterances are recorded, the instructions, when executed, cause the system to:
    - receive, from the device, location data indicating a location determined by a Global Positioning System module of the device; and
      
      determine the location based on the received location data.
  - 17. The system of claim 1, wherein, to determine the location of the user when the one or more utterances are recorded, the instructions, when executed, cause the system to determine the location based on at least one of:
    - the user'"'"'s calendar schedule;
      
      user preferences for the user;
      
      a prior location of the user;
      
      user input provided by the user;
      
      transmission tower triangulation; and
      
      dead reckoning estimation.

18. A computer-implemented method comprising:
- receiving an audio signal generated by a device based on audio input from a user, the audio signal including background audio and one or more user utterances recorded by the device, the audio signal comprising a portion that includes background audio without utterances of the user;
  
  identifying the user or the device based on an identifier for the user or the device;
  
  determining a location of the user when the one or more utterances are recorded;
  
  determining that a set of stored noise models does not include an adapted noise model that is adapted for the user or the device and that is associated with both (i) the determined location and (ii) the identifier for the user or the device;
  
  selecting a surrogate noise model in response to determining that the set of stored noise models does not include an adapted noise model that is adapted for the user or the device and that is associated with both (i) the determined location and (ii) the identifier for the user or the device;
  
  generating a filtered audio signal with reduced background audio compared to the received audio signal using the selected surrogate noise model;
  
  adapting the surrogate noise model based on the received audio signal to generate a first adapted noise model that models characteristics of background audio surrounding the user at the location; and
  
  storing the first adapted noise model as one of a plurality of adapted noise models that are specific to the identified user or device, each of the plurality of adapted noise models being associated with a different corresponding location, each of the plurality of adapted noise models having been adapted based on audio recorded by the device at the corresponding location with which the adapted noise model is associated, each of the plurality of adapted noise models being stored in association with the identifier for the user or the device and with its corresponding location such that when utterances of the user are recorded by the device at the different corresponding locations, the adapted noise model associated with the location where the device recorded the utterances is used to recognize the utterances of the user.

19. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- receiving an audio signal generated by a device based on audio input from a user, the audio signal including background audio and one or more user utterances recorded by the device, the audio signal comprising a portion that includes background audio without utterances of the user;
  
  identifying the user or the device based on an identifier for the user or the device;
  
  determining a location of the user when the one or more utterances are recorded;
  
  determining that a set of stored noise models does not include an adapted noise model that is adapted for the user or the device and that is associated with both (i) the determined location and (ii) the identifier for the user or the device;
  
  selecting a surrogate noise model in response to determining that the set of stored noise models does not include an adapted noise model that is adapted for the user or the device and that is associated with both (i) the determined location and (ii) the identifier for the user or the device;
  
  generating a filtered audio signal with reduced background audio compared to the received audio signal using the selected surrogate noise model;
  
  adapting the surrogate noise model based on the received audio signal to generate a first adapted noise model that models characteristics of background audio surrounding the user at the location; and
  
  storing the first adapted noise model as one of a plurality of adapted noise models that are specific to the identified user or device, each of the plurality of adapted noise models being associated with a different corresponding location, each of the plurality of adapted noise models having been adapted based on audio recorded by the device at the corresponding location with which the adapted noise model is associated, each of the plurality of adapted noise models being stored in association with the identifier for the user or the device and with its corresponding location such that when utterances of the user are recorded by the device at the different corresponding locations, the adapted noise model associated with the location where the device recorded the utterances is used to recognize the utterances of the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lloyd, Matthew I., Kristjansson, Trausti T.
Primary Examiner(s)
Han, Qi

Application Number

US13/530,614
Publication Number

US 20120259631A1
Time in Patent Office

620 Days
Field of Search

704/233, 704/244, 704/255, 704/235, 704/251, 704/231
US Class Current

704/233
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 21/0208 Noise filtering

Speech and noise models for speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Speech and noise models for speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links