Noise spectrum tracking for speech enhancement

US 6,289,309 B1
Filed: 12/15/1999
Issued: 09/11/2001
Est. Priority Date: 12/16/1998
Status: Expired due to Term

First Claim

Patent Images

1. A speech enhancement method which processes an input signal including both speech components and noise components to produce a noise-reduced output signal, the speech enhancement method comprising the steps of:

receiving the input signal;

segmenting the input signal into frames;

applying a spectral transformation to the input signal to obtain respective spectral representations of each frame of the input signal;

identifying frames that include speech components;

estimating respective noise components of the respective spectral representations of the frames of the input signal using a time-varying forgetting factor that reduces any contribution to the estimated noise components from data frames that exhibit rapid changes in signal power relative to earlier received data frames;

wherein the spectral representations of the frames of the input signal used to estimate the respective noise components exclude the frames that include speech components;

reducing, in magnitude, the spectral representations of each frame by the estimated noise components to produce noise-reduced spectral representations of each frame.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A spectrum-based speech enhancement system estimates and tracks the noise spectrum of a mixed speech and noise signal. The system frames and windows a digitized signal and applies the frames to a fast Fourier transform processor to generate discrete Fourier transformed (DFT) signals representing the speech plus noise signal. The system calculates the power spectrum of each frame. The speech enhancement system employs a leaky integrator that is responsive to identified noise-only components of the signal. The leaky integrator has an adaptive time-constant which compensates for non-stationary environmental noise. In addition, the speech enhancement system identified noise-only intervals by using a technique that monitors the Teager energy of the signal. The transition between noise-only signals and speech plus noise signals is softened by being made non-binary. Once the noise spectrum has been estimated, it is used to generate gain factors that multiply the DFT signals to produce noise-reduced DFT signals. The gain factors are generated based on an audible noise threshold. The method generates audible a priori and a posteriori signal to noise ratio signals and then calculates audible gain signals from these values.

Citations

11 Claims

1. A speech enhancement method which processes an input signal including both speech components and noise components to produce a noise-reduced output signal, the speech enhancement method comprising the steps of:
- receiving the input signal;
  
  segmenting the input signal into frames;
  
  applying a spectral transformation to the input signal to obtain respective spectral representations of each frame of the input signal;
  
  identifying frames that include speech components;
  
  estimating respective noise components of the respective spectral representations of the frames of the input signal using a time-varying forgetting factor that reduces any contribution to the estimated noise components from data frames that exhibit rapid changes in signal power relative to earlier received data frames;
  
  wherein the spectral representations of the frames of the input signal used to estimate the respective noise components exclude the frames that include speech components;
  
  reducing, in magnitude, the spectral representations of each frame by the estimated noise components to produce noise-reduced spectral representations of each frame.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A method according to claim 1, wherein the noise components are estimated using a leaky integrator represented by the equation
- 3. A method according to claim 1 wherein the step of estimating the respective noise components further includes the step of tracking Teager energy in the components of the data frames to more effectively distinguish unvoiced consonant sounds of the speech components from noise components.
- 4. A method according to claim 3, wherein the test for determining whether a frame is a noise-only frame or a speech plus noise frame is given by the condition:
  - $\frac{1}{L} \sum_{f = 0}^{L - 1} (\frac{{ f X_{k} (f) }^{2}}{{\hat{ψ}}_{n} (f, k - 1)}) \begin{matrix} H_{n} \\ < \\ > \\ H_{s} \end{matrix} 1 + β$ where H_sis an indication that the frame includes both speech and noise H_nis an indication that the frame includes only noise, β
    
    is a threshold value, f is an index variable that indexes L frequency bins of frame k, ∥
    
    fX_k(f)∥
    
    ²is the power spectral component of frequency bin f and {circumflex over (Ψ
    
    )}_n(f,k−
    
    1) is an estimate of the Teager energy of the noise component for frequency bin f of frame k−
    
    1, where the estimated Teager energy is given by the equation {circumflex over (Ψ
    
    )}_n(f,k)={circumflex over (P)}_n(f,k), where {circumflex over (P)}_n(f,k) is the estimated noise power spectral component of frequency bin f of frame k.
- 5. A method according to claim 1 wherein the step of estimating the respective noise components further includes the steps of:
  - identifying frames which include noise components to the relative exclusion of speech components;
    
    classifying each frame as containing only noise components or containing both noise components and speech components, wherein the transition between a noise-only signal and a voice and noise signal is made gradual to reduce the sensitivity of the noise tracking system to system parameters.
- 6. A method according to claim 1, wherein the step of estimating the respective noise components employs at least one of calculated a priori and a posteriori signal to noise ratios based on an audible noise threshold.
- 7. A method according to claim 6, wherein the step of estimating the respective noise components includes the steps of:
  - calculating the a posteriori audible signal to noise ratio, R_post^aud, according to the equation R_post^aud=P_x/{circumflex over (P)}_n^aud, where P_xis a power spectrum of the speech plus noise signal, {circumflex over (P)}_n^audis given by the formula, {circumflex over (P)}_n^aud=[min({circumflex over (P)}_n,P_x−
    
    {circumflex over (Θ
    
    )})]⁺where, {circumflex over (Θ
    
    )} is an estimated audible noise threshold, {circumflex over (P)}_nis an estimated power spectrum of the noise signal, and the function [x]⁺ is the function max(x,0).
- 8. A method according to claim 7 wherein the step of reducing, in magnitude, the spectral representations of each frame by the estimated noise components to produce noise-reduced spectral representations of each frame, includes the steps of:
  - calculating gain functions, G, to be applied to the spectral representations of the frames according to the equation;
    
    $G = \frac{1}{2} [1 + \sqrt{\frac{R_{post}^{aud} - 1}{R_{post}^{aud}}}];$ multiplying the spectral representations of the frames by the gain factors to produce the noise-reduced spectral.
- 9. A method according to claim 7, further including the step of calculating the audible a priori signal to noise ratio, R_prio^aud, according to the equation

10. A computer-usable carrier including a computer program that controls a computer to perform speech enhancement, the computer program causing the computer to process an input signal including both speech components and noise components to produce a noise-reduced output signal, by causing the computer to perform the steps of:
- receiving the input signal;
  
  segmenting the input signal into frames;
  
  applying a spectral transformation to the input signal to obtain respective spectral representations of each frame of the input signal;
  
  identifying frames that include speech components;
  
  estimating respective noise components of the respective spectral representations of the frames of the input signal using a time-varying forgetting factor that reduces any contribution to the estimated noise components from data frames that exhibit rapid changes in signal power relative to earlier received data frames;
  
  wherein the spectral representations of the frames of the input signal used to estimate the respective noise components exclude the frames that include speech components;
  
  reducing, in magnitude, the spectral representations of each frame by the estimated noise components to produce noise-reduced spectral representations of each frame.
- View Dependent Claims (11)
- - 11. A computer usable carrier according to claim 10, wherein the computer program models the noise components as a leaky integrator represented by the equation

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Sarnoff Corporation (SRI International, Inc.)
Inventors
deVries, Albert
Primary Examiner(s)
SAX, ROBERT L

Application Number

US09/464,663
Time in Patent Office

636 Days
Field of Search

704/226, 704/268, 704/227, 704/217, 704/233, 381/94.3, 381/94
US Class Current

704/233
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 2021/02168   the estimation exclusively ...

G10L 21/0208   Noise filtering

G10L 21/0216   characterised by the method...

Noise spectrum tracking for speech enhancement

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Noise spectrum tracking for speech enhancement

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links