Method for suppressing noise in a digital speech signal
First Claim
1. Method of suppressing noise in a digital speech signal processed by successive frames, comprising the steps of:
- computing spectral components of the speech signal of each frame;
computing, for each frame, overestimates of spectral components of noise included in the speech signal; and
performing a spectral subtraction including a first subtraction step in which a respective first quantity dependent on parameters including the overestimate of a corresponding spectral component of the noise for said frame is subtracted from each spectral component of the speech signal of the frame, to obtain spectral components of a first noise-suppressed signal;
computing a masking curve by applying an auditory perception model on the basis of the spectral components of the first noise-suppressed signal;
comparing the overestimates of the spectral components of the noise for the frame to the computed masking curve; and
a second subtraction step in which a respective second quantity depending on parameters including a difference between the overestimate of the corresponding spectral component of the noise and the computed masking curve is subtracted from each spectral component of the speech signal of the frame.
1 Assignment
0 Petitions
Accused Products
Abstract
A spectral subtraction is effected including: a first subtraction step in which overestimates of the spectral component of the noise are taken into account, to obtain spectral components of a first noise-suppressed signal; the computation of a masking curve by applying an auditory perception model on the basis of the spectral components of the first noise-suppressed signal; and a second subtraction step in which a respective quantity depending on parameters including a difference between the overestimate of the corresponding spectral component of the noise and the computed masking curve is subtracted from each spectral component of the speech signal in the frame. The result of the spectral subtraction is transformed into the time domain to construct a noise-suppressed speech signal.
64 Citations
21 Claims
-
1. Method of suppressing noise in a digital speech signal processed by successive frames, comprising the steps of:
-
computing spectral components of the speech signal of each frame;
computing, for each frame, overestimates of spectral components of noise included in the speech signal; and
performing a spectral subtraction including a first subtraction step in which a respective first quantity dependent on parameters including the overestimate of a corresponding spectral component of the noise for said frame is subtracted from each spectral component of the speech signal of the frame, to obtain spectral components of a first noise-suppressed signal;
computing a masking curve by applying an auditory perception model on the basis of the spectral components of the first noise-suppressed signal;
comparing the overestimates of the spectral components of the noise for the frame to the computed masking curve; and
a second subtraction step in which a respective second quantity depending on parameters including a difference between the overestimate of the corresponding spectral component of the noise and the computed masking curve is subtracted from each spectral component of the speech signal of the frame. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
where N is the number of samples used to calculate the spectral components on the basis of the conditioned signal and A(k) is the normalized autocorrelation defined by;
Sn,f2 designating the spectral component of rank f computed on the basis of the conditioned signal.
-
-
12. Method according to claim 11, wherein the computation of the masking curve uses the degree of voicing measured by the normalized entropy H.
-
13. Method according claim 3, wherein, after processing each frame, a number of the samples of the noise-suppressed speech signal supplied by such processing is retained which is equal to an integer multiple of a ratio between the sampling frequency and the estimated pitch frequency.
-
14. Method according to claim 3, wherein the estimation of the pitch frequency of the speech signal over a frame includes the steps of:
-
estimating time intervals between two consecutive breaks of the signal which can be attributed to glottal closures of the speaker occurring during the frame, the estimated pitch frequency being inversely proportional to said time intervals; and
interpolating the speech signal in said time intervals so that the conditioned signal resulting from such interpolation has a constant time interval between two consecutive breaks.
-
-
15. Method according to claim 14, wherein, after processing each frame, a number of the noise-suppressed speech signal samples supplied by such processing is retained which corresponds to an integer number of estimated time intervals.
-
16. Method according to claim 1, wherein values of a signal-to-noise ratio of the speech signal are estimated in the spectral domain for each frame and the parameters on which the first subtracted quantities depend include the estimated values of the signal-to-noise ratio, the first quantity subtracted from each spectral component of the speech signal in the frame being a decreasing function of the corresponding estimated value of the signal-to-noise ratio.
-
17. Method according to claim 16, wherein said function decreases toward zero for the highest values of the signal-to-noise ratio.
-
18. Method according to claim 1, further comprising the step of subjecting a result of the spectral subtraction to a transformation to the time domain to construct a noise-suppressed speech signal.
-
19. Device for suppressing noise in a digital speech signal processed by successive frames, comprising:
-
means for computing spectral components of the speech signal for each frame;
means for computing, for each frame, overestimates of spectral components of noise included in the speech signal; and
spectral subtraction means including;
first subtraction means to subtract, from each spectral component of the speech signal of the frame, a respective first quantity dependent on parameters including the overestimate of a corresponding spectral component of the noise for said frame, to obtain spectral components of a first noise-suppressed signal;
means for computing a masking curve by applying an auditory perception model on the basis of the spectral components of the first noise-suppressed signal;
means for comparing the overestimates of the spectral components of the noise for the frame to the computed masking curve; and
second subtraction means to subtract, from each spectral component of the speech signal of the frame, a respective second quantity depending on parameters including a difference between the overestimate of the corresponding spectral component of the noise and the computed masking curve. - View Dependent Claims (20, 21)
-
Specification