Noise reduction using multi-feature cluster tracker
First Claim
1. A method for processing acoustic signals, the method comprising:
- receiving a multichannel audio input corresponding to a plurality of audio channels;
generating a spectral representation of the multichannel audio input;
extracting one or more acoustic features from the spectral representation;
performing linear transformation of the one or more acoustic features using a dimensionality reduction technique to generate transformed data; and
classifying by a Gaussian mixture model (GMM) each time-frequency observation in the transformed data, the GMM providing a probabilistic mask of the transformed data, the probabilistic mask being used to identify noise points and signal points in the multichannel audio input.
3 Assignments
0 Petitions
Accused Products
Abstract
Provided are methods and systems for noise suppression within multiple time-frequency points of spectral representations. A multi-feature cluster tracker is used to track signal and noise sources and to predict signal versus noise dominance at each time-frequency point. Multiple features, such as binaural and monaural features, may be used for these purposes. A Gaussian mixture model (GMM) is developed and, in some embodiments, dynamically updated for distinguishing signal from noise and performing mask-based noise reduction. Each frequency band may use a different GMM or share a GMM with other frequency bands. A GMM may be combined from two models, with one trained to model time-frequency points in which the target dominates and another trained to model time-frequency points in which the noise dominates. Dynamic updates of a GMM may be performed using an expectation-maximization algorithm in an unsupervised fashion.
303 Citations
22 Claims
-
1. A method for processing acoustic signals, the method comprising:
-
receiving a multichannel audio input corresponding to a plurality of audio channels; generating a spectral representation of the multichannel audio input; extracting one or more acoustic features from the spectral representation; performing linear transformation of the one or more acoustic features using a dimensionality reduction technique to generate transformed data; and classifying by a Gaussian mixture model (GMM) each time-frequency observation in the transformed data, the GMM providing a probabilistic mask of the transformed data, the probabilistic mask being used to identify noise points and signal points in the multichannel audio input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method of calibrating an apparatus for processing acoustic signals, the method comprising:
-
receiving a multichannel training audio input corresponding to a plurality of audio channels; generating a training spectral representation of the multichannel training audio input; extracting one or more training acoustic features from the training spectral representation; performing linear transformation of the one or more training acoustic features using a dimensionality reduction technique to generate a training transformed data; and training a Gaussian mixture model (GMM) based on the transformed data, the GMM configured to provide a probabilistic mask of the transformed data, the probabilistic mask being used to identify noise points and signal points in the multichannel training audio input. - View Dependent Claims (20, 21)
-
-
22. An apparatus for processing acoustic signals, the apparatus comprising:
-
two or more microphones for receiving a multichannel audio input corresponding to two or more audio channels; an audio processing system for generating a spectral representation of the multichannel audio input, extracting one or more acoustic features from the spectral representation, performing a linear transformation of the one or more acoustic features using a dimensionality reduction technique to generate transformed data, classifying by a Gaussian mixture model (GMM) each time-frequency observation in the transformed data to provide a probabilistic mask of the transformed data, the probabilistic mask being used to identify noise points and signal points in the multichannel audio input, developing another mask for distinguishing the noise points and the signal points, and applying the other mask to the multichannel audio input to generate a processed output.
-
Specification