Monaural noise suppression based on computational auditory scene analysis

US 9,431,023 B2
Filed: 04/09/2013
Issued: 08/30/2016
Est. Priority Date: 07/12/2010
Status: Active Grant

First Claim

Patent Images

1. A method for performing noise reduction, the method comprising:

executing a program stored in a memory to transform a time-domain acoustic signal into a plurality of frequency-domain sub-band signals;

tracking at least one pitch from a plurality of pitch sources within a frequency-domain sub-band signal in the plurality of frequency-domain sub-band signals, wherein the tracking includes;

calculating at least one feature for each of the plurality of pitch sources; and

determining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker;

generating a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech; and

performing the noise reduction on the frequency-domain sub-band signal based on the speech model and the one or more noise models.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present technology provides a robust noise suppression system that may concurrently reduce noise and echo components in an acoustic signal while limiting the level of speech distortion. A time-domain acoustic signal may be received and be transformed to frequency-domain sub-band signals. Features, such as pitch, may be identified and tracked within the sub-band signals. Initial speech and noise models may be then be estimated at least in part from a probability analysis based on the tracked pitch sources. Speech and noise models may be resolved from the initial speech and noise models and noise reduction may be performed on the sub-band signals. An acoustic signal may be reconstructed from the noise-reduced sub-band signals.

Citations

16 Claims

1. A method for performing noise reduction, the method comprising:
- executing a program stored in a memory to transform a time-domain acoustic signal into a plurality of frequency-domain sub-band signals;
  
  tracking at least one pitch from a plurality of pitch sources within a frequency-domain sub-band signal in the plurality of frequency-domain sub-band signals, wherein the tracking includes;
  
  calculating at least one feature for each of the plurality of pitch sources; and
  
  determining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker;
  
  generating a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech; and
  
  performing the noise reduction on the frequency-domain sub-band signal based on the speech model and the one or more noise models.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the tracking includes tracking the at least one pitch across successive frames of the frequency-domain sub-band signal.
  - 3. The method of claim 1, wherein the generating a speech model and one or more noise models is based on at least two tracked pitches from the plurality of pitch sources.
  - 4. The method of claim 1, wherein the generating a speech model and one or more noise models includes combining the multiple models.
  - 5. The method of claim 1, wherein at least one of the one or more noise models is at least one of:
    - not updated for a sub-band in a current frame when speech is dominant in the previous frame; and
      
      not updated in the current frame when speech is dominant in the current frame for the sub-band.
  - 6. The method of claim 1, wherein the noise reduction is performed using an optimal filter.
  - 7. The method of claim 6, wherein the optimal filter is based on a least squares formulation.
  - 8. The method of claim 1, wherein the one or more noise models model undesired speech.

9. A system for performing noise reduction in an audio signal, the system comprising:
- a memory;
  
  an analysis module stored in the memory and executed by a processor to transform a time-domain acoustic to frequency-domain sub-band signals;
  
  a source inference engine stored in the memory and executed by the processor to track at least one pitch from a plurality of pitch sources within the frequency-domain sub-band signals and to generate a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech, wherein the tracking includes;
  
  calculating at least one feature for each of the plurality of pitch sources; and
  
  determining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker; and
  
  a modifier module stored in the memory and executed by the processor to perform the noise reduction on the frequency-domain sub-band signals based on the speech model and the one or more noise models.
- View Dependent Claims (10, 11, 12)
- - 10. The system of claim 9, wherein the source inference engine is executable to generate a speech model and one or more noise models based on at least two tracked pitches from the plurality of pitch sources.
  - 11. The system of claim 9, wherein the source inference engine is executable to at least one of:
    - not update at least one of the one or more noise models for a sub-band in a current frame when speech is dominant in the previous frame; and
      
      not update at least one of the one or more noise models for the sub-band in the current frame when speech is dominant in the current frame for the sub-band.
  - 12. The system of claim 9, wherein a modifier module is executable to apply a first-order filter to each sub-band in each frame.

13. A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for reducing noise in an audio signal, the method comprising:
- transforming an acoustic signal from a time-domain signal to frequency-domain sub-band signals;
  
  tracking at least one pitch from a plurality of pitch sources within the frequency-domain sub-band signals, the tracking including;
  
  calculating at least one feature for each of the plurality of pitch sources; and
  
  determining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker;
  
  generating a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech; and
  
  performing noise reduction on the frequency-domain sub-band signals based on the speech model and one or more noise models.
- View Dependent Claims (14, 15, 16)
- - 14. The non-transitory computer readable storage medium of claim 13, wherein the tracking includes tracking the at least one pitch across successive frames of the frequency-domain sub-band signals.
  - 15. The non-transitory computer readable storage medium of claim 13, wherein at least one of:
    - a respective one of the one or more noise models is not updated for a sub-band in a current frame when speech is dominant in the previous frame for the sub-band; and
      
      the respective one of the one or more noise models is not updated for a sub-band in a current frame when speech is dominant in the current frame for the sub-band.
  - 16. The non-transitory computer readable storage medium of claim 13, wherein performing the noise reduction includes applying a first-order filter to each sub-band signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Knowles Electronics Llc (Knowles Corporation)
Inventors
Avendano, Carlos, Laroche, Jean, Goodwin, Michael M., Solbach, Ludger
Primary Examiner(s)
Godbold, Douglas

Application Number

US13/859,186
Publication Number

US 20130231925A1
Time in Patent Office

1,239 Days
Field of Search

704224-230
US Class Current

1/1
CPC Class Codes

G10L 21/0208 Noise filtering

G10L 21/0272 Voice signal separating

Monaural noise suppression based on computational auditory scene analysis

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Monaural noise suppression based on computational auditory scene analysis

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links