Multi-sensory speech enhancement using a speech-state model
First Claim
Patent Images
1. A method of determining an estimate for a noise-reduced value representing a portion of a noise-reduced speech signal, the method comprising:
- generating an alternative sensor signal using an alternative sensor;
generating an air conduction microphone signal;
using the alternative sensor signal and the air conduction microphone signal to estimate a likelihood, L(St) of a speech state, St by estimating a separate likelihood of the speech state for each of a set of frequency components and combining the separate likelihoods to form the likelihood of the speech state; and
using the likelihood of the speech state to estimate the noise-reduced value, {circumflex over (X)}t, as;
where π
s is a posterior on the state and is given by;
and where;
where M* is the complex conjugate of M, Xt is a noise reduced value, Yt is a value for a frame t of the air conduction microphone signal, Bt is a value for a frame t of the alternative sensor signal, σ
u2 is a variance of sensor noise in the air conduction microphone, σ
w2 is a variance of sensor noise in the alternative sensor, g2σ
v2 is the variance of ambient noise, G is the channel response of the alternative sensor to ambient noise, H is the channel response of the alternative sensor to a clean speech signal, S is the set of all speech states, σ
s2 is a variance for a distribution that models a probability of a noise-reduced value given a speech state and E(Xt|Yt,Bt,St=s) is the expectation of Xt given Yt, Bt, and a speech state of s.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal and an air conduction microphone signal. The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate a clean speech value for a clean speech signal.
79 Citations
13 Claims
-
1. A method of determining an estimate for a noise-reduced value representing a portion of a noise-reduced speech signal, the method comprising:
-
generating an alternative sensor signal using an alternative sensor; generating an air conduction microphone signal; using the alternative sensor signal and the air conduction microphone signal to estimate a likelihood, L(St) of a speech state, St by estimating a separate likelihood of the speech state for each of a set of frequency components and combining the separate likelihoods to form the likelihood of the speech state; and using the likelihood of the speech state to estimate the noise-reduced value, {circumflex over (X)}t, as; where π
s is a posterior on the state and is given by;and where; where M* is the complex conjugate of M, Xt is a noise reduced value, Yt is a value for a frame t of the air conduction microphone signal, Bt is a value for a frame t of the alternative sensor signal, σ
u2 is a variance of sensor noise in the air conduction microphone, σ
w2 is a variance of sensor noise in the alternative sensor, g2σ
v2 is the variance of ambient noise, G is the channel response of the alternative sensor to ambient noise, H is the channel response of the alternative sensor to a clean speech signal, S is the set of all speech states, σ
s2 is a variance for a distribution that models a probability of a noise-reduced value given a speech state and E(Xt|Yt,Bt,St=s) is the expectation of Xt given Yt, Bt, and a speech state of s.- View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer storage medium having stored thereon computer-executable instructions that when executed by a processor cause the processor to perform steps comprising:
-
receiving an alternative sensor signal generated using an alternative sensor; receiving an air conduction microphone signal generated using an air conduction microphone; determining a likelihood of a speech state based on the alternative sensor signal and the air conduction microphone signal by estimating a separate likelihood of the speech state for each frequency, L(St(f)), of a set of frequency components and forming a product of the separate likelihoods to form the likelihood of the speech state, L(St) as; where the product is taken across all frequency components f in the set of frequency components; and using the likelihood of the speech state to estimate a clean speech value. - View Dependent Claims (9, 10)
-
-
11. A method of identifying a clean speech value for a clean speech signal, the method comprising:
-
receiving an alternative sensor signal generated using an alternative sensor; receiving an air conduction microphone signal generated using an air conduction microphone; forming a model wherein the clean speech signal is dependent upon a speech state, the alternative sensor signal is dependent upon the clean speech signal, and the air conduction microphone signal is dependent upon the clean speech signal, wherein forming the model comprises modeling a probability of a value of the clean speech signal given a speech state as a distribution having a variance; and determining a filtered value of the air conduction microphone signal by applying a value for a current frame of the air conduction microphone signal to a frequency-dependent noise suppression filter that is a function of a variance of ambient noise; determining the variance of the distribution as a linear combination of an estimate of a value for a clean speech signal for a preceding frame and the filtered value of the air conduction microphone signal as {circumflex over (σ
)}s2=τ
|{circumflex over (X)}t-1|2+(1−
τ
)Ks2|Yt|2, where {circumflex over (σ
)}s2 is the variance of the distribution, {circumflex over (X)}t-1 is the clean speech estimate from the preceding frame, τ
is a smoothing factor, |Yt|2 is the value for the current frame of the air conduction microphone signal and Ks is the noise suppression filter;determining an estimate of the clean speech value for the current frame based on the model, the variance of the distribution, a value for the alternative sensor signal for the current frame, and a value for the air conduction microphone signal for the current frame. - View Dependent Claims (12, 13)
-
Specification