Audio output masking for improved automatic speech recognition
First Claim
1. A system comprising:
- one or more computing devices configured to at least;
receive an audio signal;
determine a frequency band based at least partly on a likelihood that audio input data regarding a user utterance will be present within the frequency band of a subsequent input signal;
generate a filtered output signal by filtering a portion of audio output data within the frequency band from the audio signal, wherein the filtered output signal is generated prior to receiving an input signal comprising audio data regarding the user utterance and presentation of the filtered output signal, and wherein filtering the portion of audio output data from the audio signal reduces energy of the audio signal in the frequency band;
generate audio using the filtered output signal;
receive the input signal, wherein the input signal comprises audio data regarding both the user utterance and presentation of the filtered output signal;
select an acoustic model of a plurality of acoustic models based at least partly on the acoustic model being associated with the frequency band; and
perform speech recognition using the input signal and the acoustic model to generate speech recognition results.
1 Assignment
0 Petitions
Accused Products
Abstract
Features are disclosed for filtering portions of an output audio signal in order to improve automatic speech recognition on an input signal which may include a representation of the output signal. A signal that includes audio content can be received, and a frequency or band of frequencies can be selected to be filtered from the signal. The frequency band may correspond to a desired frequency band for speech recognition. An input signal can be obtained comprising audio data corresponding to a user utterance and presentation of the output signal. Automatic speech recognition can be performed on the input signal. In some cases, an acoustic model trained for use with such frequency band filtering may be used to perform speech recognition.
187 Citations
25 Claims
-
1. A system comprising:
one or more computing devices configured to at least; receive an audio signal; determine a frequency band based at least partly on a likelihood that audio input data regarding a user utterance will be present within the frequency band of a subsequent input signal; generate a filtered output signal by filtering a portion of audio output data within the frequency band from the audio signal, wherein the filtered output signal is generated prior to receiving an input signal comprising audio data regarding the user utterance and presentation of the filtered output signal, and wherein filtering the portion of audio output data from the audio signal reduces energy of the audio signal in the frequency band; generate audio using the filtered output signal; receive the input signal, wherein the input signal comprises audio data regarding both the user utterance and presentation of the filtered output signal; select an acoustic model of a plurality of acoustic models based at least partly on the acoustic model being associated with the frequency band; and perform speech recognition using the input signal and the acoustic model to generate speech recognition results. - View Dependent Claims (2, 3, 4, 5, 6, 15, 16)
-
7. A computer-implemented method comprising:
as implemented by a computing device comprising one or more processors configured to execute specific instructions, receiving a first signal comprising data regarding audio content; determining a frequency band within which audio data regarding a user utterance is expected to be present in an input signal; generating an output signal comprising a portion of the first signal, wherein the output signal is generated prior to receiving the input signal, wherein the input signal comprises audio data corresponding to the user utterance and presentation of the output signal, and wherein the output signal excludes a portion of the first signal having a frequency within the frequency band; receiving the input signal, wherein the input signal comprises audio data corresponding to the user utterance and presentation of the output signal, and wherein a portion of the input signal comprising audio data corresponding to the user utterance has a frequency within the frequency band; and providing the input signal to a speech recognizer. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
17. A device comprising:
-
means for receiving a first signal comprising data regarding audio content; means for determining a frequency band within which audio data regarding a user utterance is expected to be present in an input signal; means for generating an output signal comprising a portion of the first signal, wherein the output signal is generated prior to receiving the input signal, wherein the input signal comprises audio data corresponding to the user utterance and presentation of the output signal, and wherein the output signal excludes a portion of the first signal having a frequency within the frequency band; means for receiving the input signal, wherein the input signal comprises audio data corresponding to the user utterance and presentation of the output signal, and wherein a portion of the input signal comprising audio data corresponding to the user utterance has a frequency within the frequency band; and means for providing the input signal to a speech recognizer. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
-
Specification