Method and apparatus for multi-sensory speech enhancement

US 7,574,008 B2
Filed: 09/17/2004
Issued: 08/11/2009
Est. Priority Date: 09/17/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

for each time frame of a set of time frames, generating an alternative sensor value representing an alternative sensor signal using an alternative sensor other than an air conduction microphone;

for each time frame of the set of time frames, generating an air conduction microphone value;

identifying which frames in the set of frames do not contain speech from a speaker based on the energy level of the alternative sensor signal;

within the frames identified as not containing speech from the speaker, performing speech detection on the air conduction microphone values to determine which frames contain background speech and which frames do not contain background speech;

using alternative sensor values for the frames identified as not containing speech from the speaker and not containing background speech to determine a variance for noise of the alternative sensor;

using alternative sensor values and air conduction microphone values for the frames identified as not containing speech from the speaker but containing background speech to determine a channel response of the alternative sensor to background speech;

using the alternative sensor values and the air conduction microphone values for the set of time frames to estimate a value for a channel response of the alternative sensor to speech from the speaker; and

using the channel response of the alternative sensor to speech from the speaker, the channel response of the alternative sensor to background speech, and the variance for noise of the alternative sensor to estimate a noise-reduced value for each time frame in the set of time frames.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus determine a channel response for an alternative sensor using an alternative sensor signal and an air conduction microphone signal. The channel response is then used to estimate a clean speech value using at least a portion of the alternative sensor signal.

107 Citations

View as Search Results

13 Claims

1. A method comprising:
- for each time frame of a set of time frames, generating an alternative sensor value representing an alternative sensor signal using an alternative sensor other than an air conduction microphone;
  
  for each time frame of the set of time frames, generating an air conduction microphone value;
  
  identifying which frames in the set of frames do not contain speech from a speaker based on the energy level of the alternative sensor signal;
  
  within the frames identified as not containing speech from the speaker, performing speech detection on the air conduction microphone values to determine which frames contain background speech and which frames do not contain background speech;
  
  using alternative sensor values for the frames identified as not containing speech from the speaker and not containing background speech to determine a variance for noise of the alternative sensor;
  
  using alternative sensor values and air conduction microphone values for the frames identified as not containing speech from the speaker but containing background speech to determine a channel response of the alternative sensor to background speech;
  
  using the alternative sensor values and the air conduction microphone values for the set of time frames to estimate a value for a channel response of the alternative sensor to speech from the speaker; and
  
  using the channel response of the alternative sensor to speech from the speaker, the channel response of the alternative sensor to background speech, and the variance for noise of the alternative sensor to estimate a noise-reduced value for each time frame in the set of time frames.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein estimating a value for a channel response comprises finding an extreme of an objective function.
  - 3. The method of claim 1 further comprising using the estimate of the noise-reduced value to estimate a value for a background speech signal produced by a background speaker.
  - 4. The method of claim 1 wherein estimating a value for the channel response of the alternative sensor to speech from the speaker comprises estimating a single channel response value for all of the time frames in the set of time frames.
  - 5. The method of claim 4 wherein estimating a noise-reduced value comprises estimating a separate noise-reduced value for each time frame in the set of time frames.
  - 6. The method of claim 1 wherein estimating a value for a channel response of the alternative sensor to speech from the speaker comprises estimating the value for a current frame by weighting values for the alternative sensor signal and the air conduction microphone signal in the current frame more heavily than values for the alternative sensor signal and the air conduction microphone signal in a previous frame.

7. A computer-readable storage medium having stored thereon computer-executable instructions that when executed by a processor cause the processor to perform steps comprising:
- receiving values for an alternative sensor signal and an air conduction microphone signal for each of a set of time frames, the air conduction microphone signal comprising speech from a speaker and noise;
  
  determining a channel response for a channel from the speaker to an alternative sensor using the values for the entire set of time frames for the alternative sensor signal and the values for the entire set of time frames for the air conduction microphone signal using;
  
  $H = \frac{\begin{matrix} \sum_{t = 1}^{T} (σ_{z}^{2} {\langle B_{t} \rangle}^{2} - σ_{w}^{2} {\langle Y_{t} \rangle}^{2}) \pm \\ \sqrt{{(\sum_{t = 1}^{T} (σ_{z}^{2} {\langle B_{t} \rangle}^{2} - σ_{w}^{2} {\langle Y_{t} \rangle}^{2}))}^{2} +} \\ 4 σ_{z}^{2} σ_{w}^{2} {\langle \sum_{t = 1}^{T} B_{t}^{*} Y_{t} \rangle}^{2} \end{matrix}}{2 σ_{z}^{2} \sum_{t = 1}^{T} B_{t}^{*} Y_{t}}$ where H is the channel response for a channel from the speaker to the alternative sensor, B_tis value of the alternative sensor signal for time frame t, B*_tis the complex conjugate of B_t, |B_t| is the magnitude of B_t, Y_tis the value of the air conduction microphone signal for time frame t, |Y_t| is the magnitude of Y_t, σ
  
  _z²is a variance for noise in the air conduction microphone signal, σ
  
  _w²is a variance for noise in the alternative sensor signal and T is the number of frames in the set of time frames; and
  
  using the channel response and a value for the alternative sensor signal for one time frame in the set of time frames to estimate a clean speech value for the time frame.
- View Dependent Claims (8)
- - 8. The computer-readable storage medium of claim 7 wherein the channel response comprises a channel response to a clean speech signal.

9. A method of identifying a clean speech signal, the method comprising:
- using an alternative sensor signal from an alternative sensor other than an air conduction microphone to determine periods when a speaker is producing speech and periods when the speaker is not producing speech;
  
  performing speech detection on portions of an air conduction microphone signal associated with the periods when the speaker is not producing speech to identify which portions of the periods are no-speech portions and which portions of the periods are background speech portions;
  
  estimating a noise variance that describes noise in the alternative sensor signal during no-speech portions of the periods;
  
  using the background speech portions of the alternative sensor signal to estimate a background speech channel response for a channel from a background speaker to the alternative sensor;
  
  receiving values for the alternative sensor signal and the air conduction microphone signal for each of a set of time frames;
  
  using the noise variance, the values for the alternative sensor signal for the set of time frames and the values for the air conduction microphone for the set of time frames to estimate a channel response for a channel representing a path from the speaker to an alternative sensor for at least one time frame in the set of time frames; and
  
  using the channel response and the background speech channel response to estimate a value for the clean speech signal for each time frame in the set of time frames that the channel response was estimated from.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The method of claim 9 further comprising using the no-speech portions to estimate noise parameters that describe noise in the air conduction microphone signal.
  - 11. The method of claim 9 further comprising determining an estimate of a background speech value.
  - 12. The method of claim 11 wherein determining an estimate of a background speech value comprises using the estimate of the clean speech value to estimate the background speech value.
  - 13. The method of claim 9 further comprising using a prior model of the channel response to estimate the clean speech value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhang, Zhengyou, Droppo, James G., Acero, Alejandro, Liu, Zicheng, Huang, Xuedong David
Primary Examiner(s)
Chin; Vivian
Assistant Examiner(s)
Olaniran; Fatimat O

Application Number

US10/944,235
Publication Number

US 20060072767A1
Time in Patent Office

1,789 Days
Field of Search

381 711- 7114, 381 941- 949, 381/122, 381/92, 704/233, 704/208, 704/226, 700/94
US Class Current

381/94.7
CPC Class Codes

G10L 2021/02161   Number of inputs available ...

G10L 21/0208   Noise filtering

H04R 2460/13   Hearing devices using bone ...

H04R 3/005   for combining the signals o...

Method and apparatus for multi-sensory speech enhancement

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

107 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for multi-sensory speech enhancement

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

107 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links