Method and apparatus for high resolution speech reconstruction

US 7,596,494 B2
Filed: 11/26/2003
Issued: 09/29/2009
Est. Priority Date: 11/26/2003
Status: Active Grant

First Claim

Patent Images

1. A method of identifying a clean speech signal from a noisy speech signal, the method comprising:

a processor identifying a set of log-magnitude frequency values for each of a plurality of frames that represent the noisy speech signal;

the processor filtering the log-magnitude frequency values of the noisy speech signal to smooth the log-magnitude frequency values over time to form filtered noisy values by applying the log magnitude frequency values of the noisy speech signal to a Finite Impulse Responsive Filter having a set of filter parameters wherein at least one of the filter parameters of the set of filter parameters differs from another of the filter parameters of the set of filter parameters;

the processor determining parameters of at least one posterior probability distribution of at least one component of a clean signal value based on the set of filtered noisy values without applying a frequency-based transform to the set of filtered noisy values, the posterior probability distribution providing the probability of a log-magnitude frequency value for a clean speech signal given a filtered noisy value;

the processor using the parameters of the posterior probability distribution to estimate a set of log-magnitude frequency values for a clean speech signal; and

the processor using the log-magnitude values for the clean speech signal to produce an output clean speech signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus identify a clean speech signal from a noisy speech signal. The noisy speech signal is converted into frequency values in the frequency domain. The parameters of at least one posterior probability of at least one component of a clean signal value are then determined based on the frequency values. This determination is made without applying a frequency-based filter to the frequency values. The parameters of the posterior probability distribution are then used to estimate a set of frequency values for the clean speech signal. A clean speech signal is then constructed from the estimated set of frequency values.

Citations

16 Claims

1. A method of identifying a clean speech signal from a noisy speech signal, the method comprising:
- a processor identifying a set of log-magnitude frequency values for each of a plurality of frames that represent the noisy speech signal;
  
  the processor filtering the log-magnitude frequency values of the noisy speech signal to smooth the log-magnitude frequency values over time to form filtered noisy values by applying the log magnitude frequency values of the noisy speech signal to a Finite Impulse Responsive Filter having a set of filter parameters wherein at least one of the filter parameters of the set of filter parameters differs from another of the filter parameters of the set of filter parameters;
  
  the processor determining parameters of at least one posterior probability distribution of at least one component of a clean signal value based on the set of filtered noisy values without applying a frequency-based transform to the set of filtered noisy values, the posterior probability distribution providing the probability of a log-magnitude frequency value for a clean speech signal given a filtered noisy value;
  
  the processor using the parameters of the posterior probability distribution to estimate a set of log-magnitude frequency values for a clean speech signal; and
  
  the processor using the log-magnitude values for the clean speech signal to produce an output clean speech signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 further comprising taking the exponent of each of the log-magnitude frequency values in the set of log-magnitude frequency values for the clean speech signal to produce a set of magnitude values for the clean speech signal.
  - 3. The method of claim 2 further comprising transforming the set of magnitude values for the clean speech signal into a set of time domain values representing a frame of the clean speech signal.
  - 4. The method of claim 3 wherein identifying a set of log-magnitude frequency values for a frame of the noisy speech signal comprises transforming a frame of the noisy speech signal into the frequency domain to form frequency values for the noisy speech signal and taking the log of the magnitude of the frequency values.
  - 5. The method of claim 4 wherein transforming a frame of the noisy speech signal into the frequency domain further comprises generating a set of frequency phase values and wherein transforming the set of magnitude values for the clean speech signal into a set of time domain values further comprises using the set of frequency phase values to transform the set of magnitude values.
  - 6. The method of claim 4 wherein transforming a frame of the noisy speech signal into the frequency domain comprises producing a set of more than one hundred frequency magnitude values.
  - 7. The method of claim 1 wherein determining the parameters of at least one posterior probability distribution comprises utilizing an iterative process to determine the parameters.
  - 8. The method of claim 1 wherein determining parameters of at least one posterior distribution comprises determining parameters for each of a set of mixture components.

9. A computer storage medium storing computer-executable instructions for performing steps comprising:
- identifying log-magnitude frequency values for each of a plurality of frames that represent a noisy speech signal;
  
  applying the log-magnitude frequency values that represent frames of the noisy speech signal to a Finite Impulse Response filter having a set of filter parameters wherein one of the filter parameters of the set of filter parameters differs from another filter parameter of the set of filter parameters to provide time-based filtering and to produce filtered values representing noisy speech;
  
  determining a posterior probability based on the filtered values, wherein a frequency-based transform is not applied before the filtered values are used to determine the posterior probability and wherein the posterior probability provides the probability of log-magnitude frequency values for a clean speech signal given the filtered values;
  
  using the posterior probability to estimate a log-magnitude frequency value for a frame of a clean speech signal; and
  
  using the log-magnitude frequency value for the frame of the clean speech signal to produce an output clean speech signal.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The computer storage medium of claim 9 wherein estimating a frame of a clean speech signal comprises estimating log-magnitude frequency values for the frame of the clean speech signal.
  - 11. The computer storage medium of claim 9 further comprising taking the exponent of the log-magnitude frequency values for frames of the clean speech signal to form magnitude values.
  - 12. The computer-readable storage medium of claim 11 further comprising transforming the magnitude values into time-domain values representing a frame of the clean speech signal.
  - 13. The computer storage medium of claim 12 wherein transforming the magnitude values comprises performing an inverse Fast Fourier Transform.
  - 14. The computer storage medium of claim 13 wherein performing an inverse Fast Fourier Transform further comprises using phase values generated by converting the frames of the noisy speech signal from the time domain to the frequency domain.
  - 15. The computer storage medium of claim 9 wherein determining a posterior probability comprises using an iterative process to determine the posterior probability.
  - 16. The computer storage medium of claim 9 wherein determining a posterior probability comprises determining a separate posterior probability for each mixture component in a set of mixture components.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Kristjansson, Trausti Thor, Hershey, John R.
Primary Examiner(s)
Vo; Huyen X.

Application Number

US10/722,937
Publication Number

US 20050114117A1
Time in Patent Office

2,134 Days
Field of Search

704/206, 704/223, 704/226, 704/200, 704/227, 704/228, 704/203, 704/205, 704/233
US Class Current

704/226
CPC Class Codes

G10L 21/0208 Noise filtering

Method and apparatus for high resolution speech reconstruction

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for high resolution speech reconstruction

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links