Speech and Noise Models for Speech Recognition
First Claim
1. A system comprising:
- one or more processing devices; and
one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the system to;
receive an audio signal generated by a device based on audio input from a user, the audio signal including at least background audio and one or more user utterances recorded by the device;
determine a location of the user when the one or more utterances are recorded;
select a noise model from a plurality of noise models;
adapt the selected noise model based on the received audio signal to generate an adapted noise model that models characteristics of background audio surrounding the user at the location; and
store the adapted noise model as a noise model for the user in association with the determined location such that the adapted noise model is used for speech recognition when utterances of the user are recorded at the determined location.
2 Assignments
0 Petitions
Accused Products
Abstract
An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.
-
Citations
21 Claims
-
1. A system comprising:
-
one or more processing devices; and one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the system to; receive an audio signal generated by a device based on audio input from a user, the audio signal including at least background audio and one or more user utterances recorded by the device; determine a location of the user when the one or more utterances are recorded; select a noise model from a plurality of noise models; adapt the selected noise model based on the received audio signal to generate an adapted noise model that models characteristics of background audio surrounding the user at the location; and store the adapted noise model as a noise model for the user in association with the determined location such that the adapted noise model is used for speech recognition when utterances of the user are recorded at the determined location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-implemented method comprising:
-
receiving an audio signal generated by a device based on audio input from a user, the audio signal including at least background audio and one or more user utterances recorded by the device; determining a location of the user when the one or more utterances are recorded; selecting a noise model from a plurality of noise models; adapting the selected noise model based on the received audio signal to generate an adapted noise model that models characteristics of background audio surrounding the user at the location; and storing the adapted noise model as a noise model for the user in association with the determined location such that the adapted noise model is used for speech recognition when utterances of the user are recorded at the determined location.
-
-
21. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving an audio signal generated by a device based on audio input from a user, the audio signal including at least background audio and one or more user utterances recorded by the device; determining a location of the user when the one or more utterances are recorded; selecting a noise model from a plurality of noise models; adapting the selected noise model based on the received audio signal to generate an adapted noise model that models characteristics of background audio surrounding the user at the location; and storing the adapted noise model as a noise model for the user in association with the determined location such that the adapted noise model is used for speech recognition when utterances of the user are recorded at the determined location.
-
Specification