Noise spectrum tracking for speech enhancement
First Claim
1. A speech enhancement method which processes an input signal including both speech components and noise components to produce a noise-reduced output signal, the speech enhancement method comprising the steps of:
- receiving the input signal;
segmenting the input signal into frames;
applying a spectral transformation to the input signal to obtain respective spectral representations of each frame of the input signal;
identifying frames that include speech components;
estimating respective noise components of the respective spectral representations of the frames of the input signal using a time-varying forgetting factor that reduces any contribution to the estimated noise components from data frames that exhibit rapid changes in signal power relative to earlier received data frames;
wherein the spectral representations of the frames of the input signal used to estimate the respective noise components exclude the frames that include speech components;
reducing, in magnitude, the spectral representations of each frame by the estimated noise components to produce noise-reduced spectral representations of each frame.
4 Assignments
0 Petitions
Accused Products
Abstract
A spectrum-based speech enhancement system estimates and tracks the noise spectrum of a mixed speech and noise signal. The system frames and windows a digitized signal and applies the frames to a fast Fourier transform processor to generate discrete Fourier transformed (DFT) signals representing the speech plus noise signal. The system calculates the power spectrum of each frame. The speech enhancement system employs a leaky integrator that is responsive to identified noise-only components of the signal. The leaky integrator has an adaptive time-constant which compensates for non-stationary environmental noise. In addition, the speech enhancement system identified noise-only intervals by using a technique that monitors the Teager energy of the signal. The transition between noise-only signals and speech plus noise signals is softened by being made non-binary. Once the noise spectrum has been estimated, it is used to generate gain factors that multiply the DFT signals to produce noise-reduced DFT signals. The gain factors are generated based on an audible noise threshold. The method generates audible a priori and a posteriori signal to noise ratio signals and then calculates audible gain signals from these values.
-
Citations
11 Claims
-
1. A speech enhancement method which processes an input signal including both speech components and noise components to produce a noise-reduced output signal, the speech enhancement method comprising the steps of:
-
receiving the input signal;
segmenting the input signal into frames;
applying a spectral transformation to the input signal to obtain respective spectral representations of each frame of the input signal;
identifying frames that include speech components;
estimating respective noise components of the respective spectral representations of the frames of the input signal using a time-varying forgetting factor that reduces any contribution to the estimated noise components from data frames that exhibit rapid changes in signal power relative to earlier received data frames;
wherein the spectral representations of the frames of the input signal used to estimate the respective noise components exclude the frames that include speech components;
reducing, in magnitude, the spectral representations of each frame by the estimated noise components to produce noise-reduced spectral representations of each frame. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
3. A method according to claim 1 wherein the step of estimating the respective noise components further includes the step of tracking Teager energy in the components of the data frames to more effectively distinguish unvoiced consonant sounds of the speech components from noise components.
-
4. A method according to claim 3, wherein the test for determining whether a frame is a noise-only frame or a speech plus noise frame is given by the condition:
-
where Hs is an indication that the frame includes both speech and noise Hn is an indication that the frame includes only noise, β
is a threshold value,f is an index variable that indexes L frequency bins of frame k, ∥
fXk(f)∥
2 is the power spectral component of frequency bin f and {circumflex over (Ψ
)}n(f,k−
1) is an estimate of the Teager energy of the noise component for frequency bin f of frame k−
1, where the estimated Teager energy is given by the equation {circumflex over (Ψ
)}n(f,k)={circumflex over (P)}n(f,k), where {circumflex over (P)}n(f,k) is the estimated noise power spectral component of frequency bin f of frame k.
-
-
5. A method according to claim 1 wherein the step of estimating the respective noise components further includes the steps of:
-
identifying frames which include noise components to the relative exclusion of speech components;
classifying each frame as containing only noise components or containing both noise components and speech components, wherein the transition between a noise-only signal and a voice and noise signal is made gradual to reduce the sensitivity of the noise tracking system to system parameters.
-
-
6. A method according to claim 1, wherein the step of estimating the respective noise components employs at least one of calculated a priori and a posteriori signal to noise ratios based on an audible noise threshold.
-
7. A method according to claim 6, wherein the step of estimating the respective noise components includes the steps of:
-
calculating the a posteriori audible signal to noise ratio, Rpostaud, according to the equation Rpostaud=Px/{circumflex over (P)}naud, where Px is a power spectrum of the speech plus noise signal, {circumflex over (P)}naud is given by the formula, {circumflex over (P)}naud=[min({circumflex over (P)}n,Px−
{circumflex over (Θ
)})]+where, {circumflex over (Θ
)} is an estimated audible noise threshold,{circumflex over (P)}n is an estimated power spectrum of the noise signal, and the function [x]+ is the function max(x,0).
-
-
8. A method according to claim 7 wherein the step of reducing, in magnitude, the spectral representations of each frame by the estimated noise components to produce noise-reduced spectral representations of each frame, includes the steps of:
-
calculating gain functions, G, to be applied to the spectral representations of the frames according to the equation;
multiplying the spectral representations of the frames by the gain factors to produce the noise-reduced spectral.
-
-
9. A method according to claim 7, further including the step of calculating the audible a priori signal to noise ratio, Rprioaud, according to the equation
-
-
10. A computer-usable carrier including a computer program that controls a computer to perform speech enhancement, the computer program causing the computer to process an input signal including both speech components and noise components to produce a noise-reduced output signal, by causing the computer to perform the steps of:
-
receiving the input signal;
segmenting the input signal into frames;
applying a spectral transformation to the input signal to obtain respective spectral representations of each frame of the input signal;
identifying frames that include speech components;
estimating respective noise components of the respective spectral representations of the frames of the input signal using a time-varying forgetting factor that reduces any contribution to the estimated noise components from data frames that exhibit rapid changes in signal power relative to earlier received data frames;
wherein the spectral representations of the frames of the input signal used to estimate the respective noise components exclude the frames that include speech components;
reducing, in magnitude, the spectral representations of each frame by the estimated noise components to produce noise-reduced spectral representations of each frame. - View Dependent Claims (11)
-
Specification