Method and apparatus for multi-sensory speech enhancement
First Claim
Patent Images
1. A method comprising:
- for each time frame of a set of time frames, generating an alternative sensor value representing an alternative sensor signal using an alternative sensor other than an air conduction microphone;
for each time frame of the set of time frames, generating an air conduction microphone value;
identifying which frames in the set of frames do not contain speech from a speaker based on the energy level of the alternative sensor signal;
within the frames identified as not containing speech from the speaker, performing speech detection on the air conduction microphone values to determine which frames contain background speech and which frames do not contain background speech;
using alternative sensor values for the frames identified as not containing speech from the speaker and not containing background speech to determine a variance for noise of the alternative sensor;
using alternative sensor values and air conduction microphone values for the frames identified as not containing speech from the speaker but containing background speech to determine a channel response of the alternative sensor to background speech;
using the alternative sensor values and the air conduction microphone values for the set of time frames to estimate a value for a channel response of the alternative sensor to speech from the speaker; and
using the channel response of the alternative sensor to speech from the speaker, the channel response of the alternative sensor to background speech, and the variance for noise of the alternative sensor to estimate a noise-reduced value for each time frame in the set of time frames.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus determine a channel response for an alternative sensor using an alternative sensor signal and an air conduction microphone signal. The channel response is then used to estimate a clean speech value using at least a portion of the alternative sensor signal.
107 Citations
13 Claims
-
1. A method comprising:
-
for each time frame of a set of time frames, generating an alternative sensor value representing an alternative sensor signal using an alternative sensor other than an air conduction microphone; for each time frame of the set of time frames, generating an air conduction microphone value; identifying which frames in the set of frames do not contain speech from a speaker based on the energy level of the alternative sensor signal; within the frames identified as not containing speech from the speaker, performing speech detection on the air conduction microphone values to determine which frames contain background speech and which frames do not contain background speech; using alternative sensor values for the frames identified as not containing speech from the speaker and not containing background speech to determine a variance for noise of the alternative sensor; using alternative sensor values and air conduction microphone values for the frames identified as not containing speech from the speaker but containing background speech to determine a channel response of the alternative sensor to background speech; using the alternative sensor values and the air conduction microphone values for the set of time frames to estimate a value for a channel response of the alternative sensor to speech from the speaker; and using the channel response of the alternative sensor to speech from the speaker, the channel response of the alternative sensor to background speech, and the variance for noise of the alternative sensor to estimate a noise-reduced value for each time frame in the set of time frames. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-readable storage medium having stored thereon computer-executable instructions that when executed by a processor cause the processor to perform steps comprising:
-
receiving values for an alternative sensor signal and an air conduction microphone signal for each of a set of time frames, the air conduction microphone signal comprising speech from a speaker and noise; determining a channel response for a channel from the speaker to an alternative sensor using the values for the entire set of time frames for the alternative sensor signal and the values for the entire set of time frames for the air conduction microphone signal using; where H is the channel response for a channel from the speaker to the alternative sensor, Bt is value of the alternative sensor signal for time frame t, B*t is the complex conjugate of Bt, |Bt| is the magnitude of Bt, Yt is the value of the air conduction microphone signal for time frame t, |Yt| is the magnitude of Yt, σ
z2 is a variance for noise in the air conduction microphone signal, σ
w2 is a variance for noise in the alternative sensor signal and T is the number of frames in the set of time frames; andusing the channel response and a value for the alternative sensor signal for one time frame in the set of time frames to estimate a clean speech value for the time frame. - View Dependent Claims (8)
-
-
9. A method of identifying a clean speech signal, the method comprising:
-
using an alternative sensor signal from an alternative sensor other than an air conduction microphone to determine periods when a speaker is producing speech and periods when the speaker is not producing speech; performing speech detection on portions of an air conduction microphone signal associated with the periods when the speaker is not producing speech to identify which portions of the periods are no-speech portions and which portions of the periods are background speech portions; estimating a noise variance that describes noise in the alternative sensor signal during no-speech portions of the periods; using the background speech portions of the alternative sensor signal to estimate a background speech channel response for a channel from a background speaker to the alternative sensor; receiving values for the alternative sensor signal and the air conduction microphone signal for each of a set of time frames; using the noise variance, the values for the alternative sensor signal for the set of time frames and the values for the air conduction microphone for the set of time frames to estimate a channel response for a channel representing a path from the speaker to an alternative sensor for at least one time frame in the set of time frames; and using the channel response and the background speech channel response to estimate a value for the clean speech signal for each time frame in the set of time frames that the channel response was estimated from. - View Dependent Claims (10, 11, 12, 13)
-
Specification