Speech modeling and enhancement based on magnitude-normalized spectra

US 20070150263A1
Filed: 12/23/2005
Published: 06/28/2007
Est. Priority Date: 12/23/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

converting a frame of a speech signal into the spectral domain to identify a plurality of frequency components;

determining an energy value for the frame;

dividing the plurality of frequency components of the speech signal by the energy value for the frame to form energy-normalized frequency components; and

constructing a model from the energy-normalized frequency components.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A frame of a speech signal is converted into the spectral domain to identify a plurality of frequency components and an energy value for the frame is determined. The plurality of frequency components is divided by the energy value for the frame to form energy-normalized frequency components. A model is then constructed from the energy-normalized frequency components and can be used for speech recognition and speech enhancement.

67 Citations

View as Search Results

20 Claims

1. A method comprising:
- converting a frame of a speech signal into the spectral domain to identify a plurality of frequency components;
  
  determining an energy value for the frame;
  
  dividing the plurality of frequency components of the speech signal by the energy value for the frame to form energy-normalized frequency components; and
  
  constructing a model from the energy-normalized frequency components.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein constructing the model comprises clustering frames of energy-normalized frequency components into mixture components and forming separate model parameters for each mixture component.
  - 3. The method of claim 2 wherein clustering frames comprises determining the difference between a log magnitude of an energy-normalized frequency component in a first frame and a log magnitude of an energy-normalized frequency component in a second frame.
  - 4. The method of claim 1 further comprising filtering a D.C. frequency component from the plurality of frequency components to form filtered frequency components before determining the energy value.
  - 5. The method of claim 4 wherein dividing the plurality of frequency components by the energy value comprises dividing the filtered frequency components by the energy value.
  - 6. The method of claim 1 further comprising using the model to determine a likelihood for a frame of an input speech signal by:
    - converting the frame of the input speech signal to the spectral domain to produce a plurality of frequency components;
      
      determining an energy value for the frame of the input signal;
      
      dividing the plurality of frequency components of the frame of the input speech signal by the energy value for the frame of the input signal to form input energy-normalized frequency components; and
      
      applying the input energy-normalized frequency components to the model to determine the likelihood.
  - 7. The method of claim 1 further comprising using the model to estimate a clean speech value from a noisy speech signal.

8. A computer-readable medium having computer-executable instructions for performing steps comprising:
- receiving values representing a noisy speech signal; and
  
  using a model of energy-normalized clean-speech spectral values to estimate a noise-reduced value from the noisy speech signal.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer-readable medium of claim 8 wherein using the model of energy-normalized clean speech spectral values comprises using a mixture model of energy-normalized clean-speech spectral values.
  - 10. The computer-readable medium of claim 8 wherein estimating a noise-reduced value further comprises estimating a gain value that describes the ratio of clean speech spectral values to energy-normalized clean speech spectral values.
  - 11. The computer-readable medium of claim 10 wherein estimating a noise-reduced value comprises estimating energy-normalized noise-reduced speech spectral values and multiplying the energy-normalized noise-reduced speech spectral values by the gain value to produce noise-reduced values.
  - 12. The computer-readable medium of claim 11 further comprising estimating a separate gain value for each of a plurality of frames of the noisy speech signal.
  - 13. The computer-readable medium of claim 8 wherein estimating a noise-reduced value further comprises utilizing a model of a speech state that provides the probability that a frame of the noisy speech signal contains speech.
  - 14. The computer-readable medium of claim 8 wherein estimating a noise-reduced value comprises estimating the noise-reduced value based on an alternative sensor signal.

15. A method comprising:
- receiving an air conduction microphone signal;
  
  receiving an alternative sensor signal;
  
  using the air conduction microphone signal, the alternative sensor signal, and a model of energy-normalized clean speech spectral values to estimate a noise-reduced speech value.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15 wherein estimating a noise-reduced speech value comprises estimating energy-normalized noise-reduced speech spectral values and converting the energy-normalized noise-reduced speech spectral values into noise-reduced speech values.
  - 17. The method of claim 16 wherein converting the energy-normalized noise-reduced speech spectral values comprises multiplying the energy-normalized noise-reduced speech spectral values by a gain value.
  - 18. The method of claim 17 further comprising iterating between estimating energy-normalized noise-reduced speech spectral values and estimating the gain value.
  - 19. The method of claim 17 wherein estimating the gain value comprises estimating a separate gain value for each frame of the air conduction microphone signal.
  - 20. The method of claim 15 wherein the model of energy-normalized clean speech spectral values comprises a mixture model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhang, Zhengyou, Acero, Alejandro, Liu, Zicheng, Subramanya, Amarnag

Granted Patent

US 7,930,178 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/205
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 21/0208 Noise filtering

Speech modeling and enhancement based on magnitude-normalized spectra

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

67 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech modeling and enhancement based on magnitude-normalized spectra

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

67 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links