Noise reduction using multi-feature cluster tracker

US 9,008,329 B1
Filed: 06/08/2012
Issued: 04/14/2015
Est. Priority Date: 01/26/2010
Status: Active Grant

First Claim

Patent Images

1. A method for processing acoustic signals, the method comprising:

receiving a multichannel audio input corresponding to a plurality of audio channels;

generating a spectral representation of the multichannel audio input;

extracting one or more acoustic features from the spectral representation;

performing linear transformation of the one or more acoustic features using a dimensionality reduction technique to generate transformed data; and

classifying by a Gaussian mixture model (GMM) each time-frequency observation in the transformed data, the GMM providing a probabilistic mask of the transformed data, the probabilistic mask being used to identify noise points and signal points in the multichannel audio input.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided are methods and systems for noise suppression within multiple time-frequency points of spectral representations. A multi-feature cluster tracker is used to track signal and noise sources and to predict signal versus noise dominance at each time-frequency point. Multiple features, such as binaural and monaural features, may be used for these purposes. A Gaussian mixture model (GMM) is developed and, in some embodiments, dynamically updated for distinguishing signal from noise and performing mask-based noise reduction. Each frequency band may use a different GMM or share a GMM with other frequency bands. A GMM may be combined from two models, with one trained to model time-frequency points in which the target dominates and another trained to model time-frequency points in which the noise dominates. Dynamic updates of a GMM may be performed using an expectation-maximization algorithm in an unsupervised fashion.

303 Citations

22 Claims

1. A method for processing acoustic signals, the method comprising:
- receiving a multichannel audio input corresponding to a plurality of audio channels;
  
  generating a spectral representation of the multichannel audio input;
  
  extracting one or more acoustic features from the spectral representation;
  
  performing linear transformation of the one or more acoustic features using a dimensionality reduction technique to generate transformed data; and
  
  classifying by a Gaussian mixture model (GMM) each time-frequency observation in the transformed data, the GMM providing a probabilistic mask of the transformed data, the probabilistic mask being used to identify noise points and signal points in the multichannel audio input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1, wherein the one or more acoustic features correspond to each individual channel of the plurality of audio channels.
  - 3. The method of claim 1, wherein the one or more acoustic features correspond to interactions between individual channels of the plurality of audio channels.
  - 4. The method of claim 1, wherein the one or more acoustic features comprise one or more of an interaural level difference, an interaural phase difference, a primary microphone energy, an estimated pitch, and an estimated pitch saliency.
  - 5. The method of claim 1, wherein the dimensionality reduction technique comprises a linear support vector machine and performing the linear transformation comprises subtracting a data mean, whitening the data, generating a maximum margin hyperplane separating speech points from the noise points in the multichannel audio input, and projecting the speech points and the noise points onto the maximum margin hyperplane.
  - 6. The method of claim 5, wherein performing the linear transformation is repeated for each of multiple dimensions in the null space of a previous maximum margin hyperplane.
  - 7. The method of claim 6, wherein the multiple dimensions are orthogonal and decorrelated.
  - 8. The method of claim 1, wherein a different GMM is used for each frequency band of the multichannel audio input.
  - 9. The method of claim 1, wherein the noise points and signal points are identified in the multichannel audio input based on a probability of each data point determined with the GMM.
  - 10. The method of claim 1, wherein the noise points and signal points are identified by further processing probabilities of data points determined using the GMM, the further processing comprises incorporating local contextual information.
  - 11. The method of claim 1, further comprising updating the GMM based on the transformed data generated by the linear transformation and repeating the classifying operation using the updated GMM.
  - 12. The method of claim 11, wherein repeating the classifying operation using the updated GMM is performed on a new set of transformed data.
  - 13. The method of claim 1, further comprising repeating receiving, generating, extracting, performing, and classifying operations on a new multichannel audio input to identify new noise points and new signal points.
  - 14. The method of claim 13, wherein the original GMM is used during the repeated classifying operation.
  - 15. The method of claim 1, further comprising generating a binary mask such as a post-filter mask or a canceller adaptation control mask based on the identified noise points and the identified signal points.
  - 16. The method of claim 15, further comprising applying the generated mask to the acoustic signals to suppress noise.
  - 17. The method of claim 1, wherein, prior to being used for classifying, the GMM is trained to optimize generative costs and discriminative costs.
  - 18. The method of claim 1, wherein the GMM comprises two Gaussian mixture models (GMMs), a first GMM trained to identify the noise points in the transformed data and a second GMM trained to identify the signal points in the transformed data.

19. A method of calibrating an apparatus for processing acoustic signals, the method comprising:
- receiving a multichannel training audio input corresponding to a plurality of audio channels;
  
  generating a training spectral representation of the multichannel training audio input;
  
  extracting one or more training acoustic features from the training spectral representation;
  
  performing linear transformation of the one or more training acoustic features using a dimensionality reduction technique to generate a training transformed data; and
  
  training a Gaussian mixture model (GMM) based on the transformed data, the GMM configured to provide a probabilistic mask of the transformed data, the probabilistic mask being used to identify noise points and signal points in the multichannel training audio input.
- View Dependent Claims (20, 21)
- - 20. The method of claim 19, wherein the linear transformation and GMM are selected from the plurality of linear transformations and GMMs based on a number of microphones and microphone spacing.
  - 21. The method of claim 19, wherein training the GMM comprises an algorithm to optimize generative costs and discriminative costs.

22. An apparatus for processing acoustic signals, the apparatus comprising:
- two or more microphones for receiving a multichannel audio input corresponding to two or more audio channels;
  
  an audio processing system for generating a spectral representation of the multichannel audio input, extracting one or more acoustic features from the spectral representation, performing a linear transformation of the one or more acoustic features using a dimensionality reduction technique to generate transformed data, classifying by a Gaussian mixture model (GMM) each time-frequency observation in the transformed data to provide a probabilistic mask of the transformed data, the probabilistic mask being used to identify noise points and signal points in the multichannel audio input, developing another mask for distinguishing the noise points and the signal points, and applying the other mask to the multichannel audio input to generate a processed output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Knowles Electronics Llc (Knowles Corporation)
Original Assignee
Audience Corporation
Inventors
Avendano, Carlos, Mandel, Michael
Primary Examiner(s)
Goins, Davetta W

Application Number

US13/492,780
Time in Patent Office

1,040 Days
Field of Search

381 711- 7114, 381 941- 949, 704/223, 704/221, 704/226, 704/206, 704/236, 704/255
US Class Current

381/94.2
CPC Class Codes

G10K 15/00   Acoustics not otherwise pro...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0232   Processing in the frequency...

Noise reduction using multi-feature cluster tracker

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

303 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Noise reduction using multi-feature cluster tracker

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

303 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links