SYSTEM AND METHOD FOR PERFORMING SPEECH ENHANCEMENT USING A DEEP NEURAL NETWORK-BASED SIGNAL
First Claim
1. A system for performing speech enhancement using a Deep Neural Network (DNN)-based signal comprising:
- a loudspeaker to output a loudspeaker signal, wherein the loudspeaker is being driven by a reference signal;
at least one microphone to receive at least one of;
a near-end speaker signal, an ambient noise signal, or the loudspeaker signal and to generate a microphone signal;
an acoustic-echo-canceller (AEC) to receive the reference signal and the microphone signal, and to generate an AEC echo-cancelled signal;
a loudspeaker signal estimator to receive the microphone signal and the AEC echo-cancelled signal and to generate an estimated loudspeaker signal; and
a deep neural network (DNN) to receive the microphone signal, the reference signal, the AEC echo-cancelled signal, and the estimated loudspeaker signal, and to generate a clean speech signal,wherein the DNN is trained offline by exciting the at least one microphone using a target training signal that includes a signal approximation of clean speech.
1 Assignment
0 Petitions
Accused Products
Abstract
Method for performing speech enhancement using a Deep Neural Network (DNN)-based signal starts with training DNN offline by exciting a microphone using target training signal that includes signal approximation of clean speech. Loudspeaker is driven with a reference signal and outputs loudspeaker signal. Microphone then generates microphone signal based on at least one of: near-end speaker signal, ambient noise signal, or loudspeaker signal. Acoustic-echo-canceller (AEC) generates AEC echo-cancelled signal based on reference signal and microphone signal. Loudspeaker signal estimator generates estimated loudspeaker signal based on microphone signal and AEC echo-cancelled signal. DNN receives microphone signal, reference signal, AEC echo-cancelled signal, and estimated loudspeaker signal and generates a speech reference signal that includes signal statistics for residual echo or for noise. Noise suppressor generates a clean speech signal by suppressing noise or residual echo in the microphone signal based on speech reference signal. Other embodiments are described.
33 Citations
20 Claims
-
1. A system for performing speech enhancement using a Deep Neural Network (DNN)-based signal comprising:
-
a loudspeaker to output a loudspeaker signal, wherein the loudspeaker is being driven by a reference signal; at least one microphone to receive at least one of;
a near-end speaker signal, an ambient noise signal, or the loudspeaker signal and to generate a microphone signal;an acoustic-echo-canceller (AEC) to receive the reference signal and the microphone signal, and to generate an AEC echo-cancelled signal; a loudspeaker signal estimator to receive the microphone signal and the AEC echo-cancelled signal and to generate an estimated loudspeaker signal; and a deep neural network (DNN) to receive the microphone signal, the reference signal, the AEC echo-cancelled signal, and the estimated loudspeaker signal, and to generate a clean speech signal, wherein the DNN is trained offline by exciting the at least one microphone using a target training signal that includes a signal approximation of clean speech. - View Dependent Claims (2, 3, 5, 6, 7, 8, 9)
-
-
4. (canceled)
-
10. A system for performing speech enhancement using a Deep Neural Network (DNN)-based signal comprising:
-
a loudspeaker to output a loudspeaker signal, wherein the loudspeaker is being driven by a reference signal; at least one microphone to receive at least one of;
a near-end speaker signal, an ambient noise signal, or the loudspeaker signal and to generate a microphone signal;an acoustic-echo-canceller (AEC) to receive the reference signal and the microphone signal, and to generate an AEC echo-cancelled signal; a loudspeaker signal estimator to receive the microphone signal and the AEC echo-cancelled signal and to generate an estimated loudspeaker signal; and a deep neural network (DNN) to receive the microphone signal, the reference signal, the AEC echo-cancelled signal, and the estimated loudspeaker signal, and to generate a speech reference signal that includes signal statistics for residual echo or signal statistics for noise, wherein the DNN is trained offline by exciting the at least one microphone using a target training signal that includes a signal approximation of clean speech. - View Dependent Claims (11, 12, 14, 15, 16, 17)
-
-
13. (canceled)
-
18. A method for performing speech enhancement using a Deep Neural Network (DNN)-based signal comprising:
-
training a deep neural network (DNN) offline by exciting at least one microphone using a target training signal that includes a signal approximation of clean speech; driving a loudspeaker with a reference signal, wherein the loudspeaker outputs a loudspeaker signal; generating by the at least one microphone a microphone signal based on at least one of;
a near-end speaker signal, an ambient noise signal, or the loudspeaker signal;generating by an acoustic-echo-canceller (AEC) an AEC echo-cancelled signal based on the reference signal and the microphone signal; generating by a loudspeaker signal estimator an estimated loudspeaker signal based on the microphone signal and the AEC echo-cancelled signal; receiving by the DNN the microphone signal, the reference signal, the AEC echo-cancelled signal, and the estimated loudspeaker signal; and generating by the DNN a speech reference signal that includes signal statistics for residual echo or signal statistics for noise based on the microphone signal, the reference signal, the AEC echo-cancelled signal, and the estimated loudspeaker signal. - View Dependent Claims (19, 20)
-
Specification