Speech and Noise Models for Speech Recognition

US 20120259631A1
Filed: 06/22/2012
Published: 10/11/2012
Est. Priority Date: 06/14/2010
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processing devices; and

one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the system to;

receive an audio signal generated by a device based on audio input from a user, the audio signal including at least background audio and one or more user utterances recorded by the device;

determine a location of the user when the one or more utterances are recorded;

select a noise model from a plurality of noise models;

adapt the selected noise model based on the received audio signal to generate an adapted noise model that models characteristics of background audio surrounding the user at the location; and

store the adapted noise model as a noise model for the user in association with the determined location such that the adapted noise model is used for speech recognition when utterances of the user are recorded at the determined location.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

Citations

21 Claims

1. A system comprising:
- one or more processing devices; and
  
  one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the system to;
  
  receive an audio signal generated by a device based on audio input from a user, the audio signal including at least background audio and one or more user utterances recorded by the device;
  
  determine a location of the user when the one or more utterances are recorded;
  
  select a noise model from a plurality of noise models;
  
  adapt the selected noise model based on the received audio signal to generate an adapted noise model that models characteristics of background audio surrounding the user at the location; and
  
  store the adapted noise model as a noise model for the user in association with the determined location such that the adapted noise model is used for speech recognition when utterances of the user are recorded at the determined location.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The system of claim 1, wherein the instructions further comprise instructions that, when executed, cause the system to generate a filtered audio signal with reduced background audio compared to the received audio signal using the selected noise model.
  - 3. The system of claim 1, wherein:
    - to determine the location of the user when the one or more utterances are recorded, the instructions, when executed, cause the system to determine a geographical location of the user when the one or more utterances are recorded; and
      
      to store the adapted noise model as a noise model for the user for audio input recorded at the location, the instructions, when executed, cause the system to store the adapted noise model as a noise model for the user for audio input recorded at the geographical location.
  - 4. The system of claim 1, wherein:
    - to determine the location of the user when the one or more utterances are recorded, the instructions, when executed, cause the system to determine a type of location where the user is located when the one or more utterances are recorded; and
      
      to store the adapted noise model as a noise model for the user for audio input recorded at the location, the instructions, when executed, cause the system to store the adapted noise model as a noise model for the user for audio input recorded at locations of the same type as the determined type of location.
  - 5. The system of claim 1, wherein, to select the noise model, the instructions, when executed, cause the system to select the noise model based on the determined location.
  - 6. The system of claim 5, wherein, to select the noise model based on the determined location, the instructions, when executed, cause the system to select a noise model associated with the user that was previously adapted to model characteristics of background audio at the determined location.
  - 7. The system of claim 6, wherein, to select the noise model based on the determined location, the instructions, when executed, cause the system to select the noise model from among a plurality of noise models that were each previously adapted to model characteristics of background audio at a corresponding location based on audio input from the user.
  - 8. The system of claim 5, wherein, to select the noise model based on the determined location, the instructions, when executed, cause the system to select the noise model from among a plurality of noise models that are each developed to model characteristics of background audio at a corresponding location based on audio input that is recorded at the corresponding location by other users.
  - 9. The system of claim 8, wherein the audio input that is recorded at the determined location by other users comprises voice queries submitted by the other users.
  - 10. The system of claim 1, wherein, to store the adapted noise model as a noise model for the user, the instructions, when executed, cause the system to store the adapted noise model as one of a plurality of location-specific noise models adapted for the user, each of the location-specific noise models modeling characteristics of background audio at a different location.
  - 11. The system of claim 1, wherein the instructions further comprise instructions that, when executed, cause the system to:
    - receive a second audio signal generated by the device based on second audio input from the user, the audio signal including at least second background audio and one or more second user utterances recorded by the device;
      
      determine that, when the one or more second utterances are recorded, the user is at the determined location where the one or more first utterances were recorded; and
      
      perform noise compensation on the second audio signal using the adapted noise model in response to determining that, when the one or more second utterances are recorded, the user is at the determined location where the one or more first utterances were recorded.
  - 12. The system of claim 1, wherein the instructions further comprise instructions that, when executed, cause the system to:
    - receive a second audio signal generated by the device based on second audio input from the user, the audio signal including at least second background audio and one or more second user utterances recorded by the device;
      
      determine a second location of the user when the one or more second utterances are recorded, the second location being different from the location of the user when the one or more first utterances are recorded;
      
      select a second noise model based on the second location, from among multiple noise models associated with the user, each of the multiple noise models associated with the user being adapted based on audio input from the user and modeling characteristics of background noise at a different location; and
      
      perform noise compensation on the second audio signal using the second noise model.
  - 13. The system of claim 1, wherein the instructions further comprise instructions that, when executed, cause the system to:
    - access a user speech model associated with the user that models speech characteristics of the user; and
      
      perform noise compensation on the received audio signal using the user speech model and the selected noise model.
  - 14. The system of claim 13, wherein:
    - the instructions comprise instructions that, when executed, cause the system to determine that the background audio in the audio signal is above a defined threshold; and
      
      to adapt the selected noise model, the instructions, when executed, cause the system to adapt the selected noise model in response to determining that the background audio in the audio signal is above the defined threshold.
  - 15. The system of claim 1, wherein the selected noise model comprises a surrogate noise model that has not been adapted to model characteristics of background audio surrounding the user.
  - 16. The system of claim 15, wherein, to select the surrogate noise model, the instructions include instructions that, when executed by the one or more processing devices, cause the system to:
    - receive an initial audio signal that includes at least an initial user audio portion that corresponds to one or more user utterances recorded by the device;
      
      determine similarity metrics between multiple surrogate noise models and an expected noise model for the user determined based on the initial audio signal; and
      
      select the surrogate noise model, from among multiple surrogate noise models, based on the similarity metrics.
  - 17. The system of claim 16, wherein each of the multiple surrogate noise models model characteristics of background audio in a particular location.
  - 18. The system of claim 16, wherein each of the multiple surrogate noise models model characteristics of background audio in a particular kind of environmental condition.
  - 19. The system of claim 15, wherein, to select the surrogate noise model, the instructions include instructions that, when executed by the one or more processing devices, cause the system to:
    - receive context information about the device; and
      
      select the surrogate noise model, from among the multiple surrogate noise models, based on the context information.

20. A computer-implemented method comprising:
- receiving an audio signal generated by a device based on audio input from a user, the audio signal including at least background audio and one or more user utterances recorded by the device;
  
  determining a location of the user when the one or more utterances are recorded;
  
  selecting a noise model from a plurality of noise models;
  
  adapting the selected noise model based on the received audio signal to generate an adapted noise model that models characteristics of background audio surrounding the user at the location; and
  
  storing the adapted noise model as a noise model for the user in association with the determined location such that the adapted noise model is used for speech recognition when utterances of the user are recorded at the determined location.

21. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- receiving an audio signal generated by a device based on audio input from a user, the audio signal including at least background audio and one or more user utterances recorded by the device;
  
  determining a location of the user when the one or more utterances are recorded;
  
  selecting a noise model from a plurality of noise models;
  
  adapting the selected noise model based on the received audio signal to generate an adapted noise model that models characteristics of background audio surrounding the user at the location; and
  
  storing the adapted noise model as a noise model for the user in association with the determined location such that the adapted noise model is used for speech recognition when utterances of the user are recorded at the determined location.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lloyd, Matthew I., Kristjansson, Trausti T.

Granted Patent

US 8,666,740 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 21/0208 Noise filtering

Speech and Noise Models for Speech Recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Speech and Noise Models for Speech Recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links