Hybrid reference signal for acoustic echo cancellation
First Claim
1. A computer-implemented method for cancelling an echo from an audio signal to isolate received speech, the method comprising:
- sending first playback audio data to a first wireless speaker;
receiving first input audio data from a first microphone of a microphone array, the first input audio data including a first representation of audible sound output by the first wireless speaker and speech input;
receiving second input audio data from a second microphone of the microphone array, the second input audio data including a second representation of the audible sound output by the first wireless speaker and the speech input;
determining a first portion of combined input audio data, the combined input audio data comprising at least the first input audio data and the second input audio data, the first portion of the combined input audio data comprising a first portion of the first input audio data corresponding to a first direction and a first portion of the second input audio data corresponding to the first direction;
determining a second portion of the combined input audio data, the second portion of the combined input audio data comprising a second portion of the first input audio data corresponding to a second direction and a second portion of the second input audio data corresponding to the second direction;
selecting at least the first portion of the combined input audio data as a target signal on which to perform echo cancellation;
blocking high frequencies of the first playback audio data using a low pass filter to generate a first reference signal, the first reference signal including the first playback audio data below a first cutoff frequency;
blocking low frequencies of the second portion of the combined input audio data using a high pass filter to generate a second reference signal, the second reference signal including the second portion of the combined input audio data above the first cutoff frequency;
generating an input reference signal by combining the first reference signal and the second reference signal;
removing the input reference signal from the target signal to generate a first output audio signal that includes the speech input;
performing speech recognition processing on the first output audio signal to determine a command; and
executing the command.
1 Assignment
0 Petitions
Accused Products
Abstract
An echo cancellation system that uses a combined reference signal using a playback reference signal and an adaptive reference signal. The playback reference signal is generated from a playback signal sent to a loudspeaker and the adaptive reference signal is generated using beamforming on microphone inputs corresponding to audio received from the loudspeaker. The system applies a low pass filter to the playback reference signal and applies a high pass filter to the adaptive reference signal to generate the combined reference signal. The system may remove the combined reference signal from target signals associated with the microphone inputs to isolate speech included in the target signals.
191 Citations
20 Claims
-
1. A computer-implemented method for cancelling an echo from an audio signal to isolate received speech, the method comprising:
-
sending first playback audio data to a first wireless speaker; receiving first input audio data from a first microphone of a microphone array, the first input audio data including a first representation of audible sound output by the first wireless speaker and speech input; receiving second input audio data from a second microphone of the microphone array, the second input audio data including a second representation of the audible sound output by the first wireless speaker and the speech input; determining a first portion of combined input audio data, the combined input audio data comprising at least the first input audio data and the second input audio data, the first portion of the combined input audio data comprising a first portion of the first input audio data corresponding to a first direction and a first portion of the second input audio data corresponding to the first direction; determining a second portion of the combined input audio data, the second portion of the combined input audio data comprising a second portion of the first input audio data corresponding to a second direction and a second portion of the second input audio data corresponding to the second direction; selecting at least the first portion of the combined input audio data as a target signal on which to perform echo cancellation; blocking high frequencies of the first playback audio data using a low pass filter to generate a first reference signal, the first reference signal including the first playback audio data below a first cutoff frequency; blocking low frequencies of the second portion of the combined input audio data using a high pass filter to generate a second reference signal, the second reference signal including the second portion of the combined input audio data above the first cutoff frequency; generating an input reference signal by combining the first reference signal and the second reference signal; removing the input reference signal from the target signal to generate a first output audio signal that includes the speech input; performing speech recognition processing on the first output audio signal to determine a command; and executing the command. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method, comprising:
-
sending first playback audio data to a first wireless speaker; receiving first input audio data from a first microphone of a microphone array, the first input audio data including a first representation of sound output by the first wireless speaker and speech input; receiving second input audio data from a second microphone of the microphone array, the second input audio data including a second representation of the audible sound output by the first wireless speaker and the speech input; determining a first portion of combined input audio data, the combined input audio data comprising at least the first input audio data and the second input audio data, the first portion of the combined input audio data comprising a first portion of the first input audio data corresponding to a first direction and a first portion of the second input audio data corresponding to the first direction; determining a second portion of the combined input audio data, the second portion of the combined input audio data comprising a second portion of the first input audio data corresponding to a second direction and a second portion of the second input audio data corresponding to the second direction; generating first reference data based on the first playback audio data, the first reference data having frequencies below a first cutoff frequency; generating second reference data based on the second portion of the combined input audio data, the second reference data having frequencies above the first cutoff frequency; and generating a reference signal by combining the first reference data and the second reference data. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A device, comprising:
-
at least one processor; a memory device including instructions operable to be executed by the at least one processor to configure the device to; send first playback audio data to a first wireless speaker; receive first input audio data from a first microphone of a microphone array, the first input audio data including a first representation of sound output by the first wireless speaker and speech input; receive second input audio data from a second microphone of the microphone array, the second input audio data including a second representation of the audible sound output by the first wireless speaker and the speech input; determine a first portion of combined input audio data, the combined input audio data comprising at least the first input audio data and the second input audio data, the first portion of the combined input audio data comprising a first portion of the first input audio data corresponding to a first direction and a first portion of the second input audio data corresponding to the first direction; determine a second portion of the combined input audio data, the second portion of the combined input audio data comprising a second portion of the first input audio data corresponding to a second direction and a second portion of the second input audio data corresponding to the second direction; generate first reference data based on the first playback audio data, the first reference data having frequencies below a first cutoff frequency; generate second reference data based on the second portion of the combined input audio data, the second reference data having frequencies above the first cutoff frequency; and generate a reference signal by combining the first reference data and the second reference data. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification