Speech and Noise Models for Speech Recognition
First Claim
1. A system comprising:
- one or more procession devices; and
one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the one or more processing devices to;
receive an audio signal generated by a device based on audio input from a user, the audio signal including at least a user audio portion that corresponds to one or more user utterances recorded by the device;
access a user speech model associated with the user;
determine that background audio in the audio signal is below a defined threshold;
in response to determining that the background audio in the audio signal is below the defined threshold, adapt the accessed user speech model based on the audio signal to generate an adapted user speech model that models speech characteristics of the user; and
perform noise compensation on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.
2 Assignments
0 Petitions
Accused Products
Abstract
An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.
-
Citations
24 Claims
-
1. A system comprising:
-
one or more procession devices; and one or more storage devices storing instructions that, when executed by the one or more processing devices, cause the one or more processing devices to; receive an audio signal generated by a device based on audio input from a user, the audio signal including at least a user audio portion that corresponds to one or more user utterances recorded by the device; access a user speech model associated with the user; determine that background audio in the audio signal is below a defined threshold; in response to determining that the background audio in the audio signal is below the defined threshold, adapt the accessed user speech model based on the audio signal to generate an adapted user speech model that models speech characteristics of the user; and perform noise compensation on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal. - View Dependent Claims (2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
4. The system of claim 4 wherein the audio signal includes an environmental audio portion that corresponds only to background audio surrounding the user and, to determine the signal-to-noise ratio of the audio signal, the instructions include instructions that, when executed, cause the one or more processing devices to:
-
determine an amount of energy in the user audio portion of the audio signal; determine an amount of energy in the environmental audio portion of the audio signal; and determine the signal-to-noise ratio by determining the ratio between the amount of energy in the user audio portion and the environmental audio portion.
-
-
22. A system comprising:
-
a client device configured to send, to an automated speech recognition system, an audio signal that includes at least a user audio portion that corresponds to one or more user utterances recorded by the device; an automated speech recognition system configured to; receive the audio signal from the client device; access a user speech model associated with the user; determine that background audio in the audio signal is below a defined threshold; in response to determining that the background audio in the audio signal is below the defined threshold, adapt the accessed user speech model based on the audio signal to generate an adapted user speech model that models speech characteristics of the user; and perform noise compensation on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal. - View Dependent Claims (23)
-
-
24. A method comprising:
-
receiving an audio signal generated by a device based on audio input from a user, the audio signal including at least a user audio portion that corresponds to one or more user utterances recorded by the device; accessing a user speech model associated with the user; determine that background audio in the audio signal is below a defined threshold; in response to determining that the background audio in the audio signal is below the defined threshold, adapting the accessed user speech model based on the audio signal to generate an adapted user speech model that models speech characteristics of the user; and performing noise compensation on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.
-
Specification