Speech and noise models for speech recognition
First Claim
1. A system comprising:
- one or more processing devices; and
one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the one or more processing devices to;
receive a first audio signal generated by a device based on audio input from a user, the first audio signal including at least a first user audio portion that corresponds to both first background audio and one or more first user utterances recorded by the device;
access a user speech model associated with the user;
determine that the first background audio in the first user audio portion is below a defined threshold;
in response to determining that the first background audio in the first user audio portion is below the defined threshold, adapt the accessed user speech model based on the first audio signal to generate an adapted user speech model that models speech characteristics of the user;
receive a second audio signal generated by the device based on second audio input from a user, the second audio signal including at least a second user audio portion that corresponds to both second background audio and one or more second user utterances recorded by the device;
determine that the second background audio in the second user audio portion is not below the defined threshold;
in response to determining that the second background audio in the second user audio portion is not below the defined threshold, not adapt the accessed user speech model based on the second audio signal; and
perform noise compensation on a third audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the third audio signal.
2 Assignments
0 Petitions
Accused Products
Abstract
An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.
-
Citations
24 Claims
-
1. A system comprising:
-
one or more processing devices; and one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the one or more processing devices to; receive a first audio signal generated by a device based on audio input from a user, the first audio signal including at least a first user audio portion that corresponds to both first background audio and one or more first user utterances recorded by the device; access a user speech model associated with the user; determine that the first background audio in the first user audio portion is below a defined threshold; in response to determining that the first background audio in the first user audio portion is below the defined threshold, adapt the accessed user speech model based on the first audio signal to generate an adapted user speech model that models speech characteristics of the user; receive a second audio signal generated by the device based on second audio input from a user, the second audio signal including at least a second user audio portion that corresponds to both second background audio and one or more second user utterances recorded by the device; determine that the second background audio in the second user audio portion is not below the defined threshold; in response to determining that the second background audio in the second user audio portion is not below the defined threshold, not adapt the accessed user speech model based on the second audio signal; and perform noise compensation on a third audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the third audio signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A system comprising:
-
a client device configured to send, to an automated speech recognition system, a first audio signal that includes at least a first user audio portion that corresponds to both first background audio and one or more first user utterances recorded by the device, a second audio signal that includes at least a second user audio portion that corresponds to both second background audio and one or more second user utterances recorded by the device, and a third audio signal; an automated speech recognition system configured to; receive the first audio signal and the second audio signal from the client device; access a user speech model associated with the user; determine that the first background audio in the first user audio portion is below a defined threshold; in response to determining that the first background audio in the first user audio portion is below the defined threshold, adapt the accessed user speech model based on the first audio signal to generate an adapted user speech model that models speech characteristics of the user; determine that the second background audio in the second user audio portion is not below the defined threshold; in response to determining that the second background audio in the second user audio portion is not below the defined threshold, not adapt the accessed user speech model based on the second audio signal; and perform noise compensation on the received third audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received third audio signal. - View Dependent Claims (23)
-
-
24. A method comprising:
-
receiving, by one or more processing devices, a first audio signal generated by a device based on audio input from a user, the first audio signal including at least a first user audio portion that corresponds to both first background audio and one or more first user utterances recorded by the device; accessing, by the one or more processing devices, a user speech model associated with the user; determining, by the one or more processing devices, that the first background audio in the first user audio portion is below a defined threshold; in response to determining that the first background audio in the first user audio portion is below the defined threshold, adapting, by the one or more processing devices, the accessed user speech model based on the first audio signal to generate an adapted user speech model that models speech characteristics of the user; receiving, by the one or more processing devices, a second audio signal generated by the device based on second audio input from a user, the second audio signal including at least a second user audio portion that corresponds to both second background audio and one or more second user utterances recorded by the device; determining, by the one or more processing devices, that the second background audio in the second user audio portion is not below the defined threshold; in response to determining that the second background audio in the second user audio portion is not below the defined threshold, not adapting, by the one or more processing devices, the accessed user speech model based on the second audio signal; and performing, by the one or more processing devices, noise compensation on a third audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the third audio signal.
-
Specification