System and method for multi-channel multi-feature speech/noise classification for noise suppression
First Claim
1. A computer-implemented architecture for classifying an audio signal received at a multi-channel noise suppression system as speech or noise, the architecture comprising:
- a first layer for generating a feature-based speech probability for each of a plurality of signal classification features measured for a frame of the signal input from each of a plurality of input channels;
a second layer for generating, for each of the plurality of input channels, a speech probability for the input channel by combining the feature-based speech probabilities of the input channel; and
a third layer for generating a combined speech probability for the frame of the signal using the speech probabilities of the plurality of input channels,wherein the layers comprise a probabilistic layered network model and an additive model or a multiplicative model is used for the third layer of the probabilistic layered network model.
1 Assignment
0 Petitions
Accused Products
Abstract
An architecture and framework for speech/noise classification of an audio signal using multiple features with multiple input channels (e.g., microphones) are provided. The architecture may be implemented with noise suppression in a multi-channel environment where noise suppression is based on an estimation of the noise spectrum. The noise spectrum is estimated using a model that classifies each time/frame and frequency component of a signal as speech or noise by applying a speech/noise probability function. The speech/noise probability function estimates a speech/noise probability for each frequency and time bin. A speech/noise classification estimate is obtained by fusing (e.g., combining) data across different input channels using a layered network model. Individual feature data acquired at each channel and/or from a beam-formed signal is mapped to a speech probability, which is combined through layers of the model into a final speech/noise classification for use in noise estimation and filtering processes for noise suppression.
14 Citations
20 Claims
-
1. A computer-implemented architecture for classifying an audio signal received at a multi-channel noise suppression system as speech or noise, the architecture comprising:
-
a first layer for generating a feature-based speech probability for each of a plurality of signal classification features measured for a frame of the signal input from each of a plurality of input channels; a second layer for generating, for each of the plurality of input channels, a speech probability for the input channel by combining the feature-based speech probabilities of the input channel; and a third layer for generating a combined speech probability for the frame of the signal using the speech probabilities of the plurality of input channels, wherein the layers comprise a probabilistic layered network model and an additive model or a multiplicative model is used for the third layer of the probabilistic layered network model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A multi-channel noise suppression system comprising:
-
a plurality of input channels; and a noise suppression module configured to; measure signal classification features for an audio signal frame input from each of the plurality of input channels; calculate a feature-based speech probability for each of the measured signal classification features of each of the plurality of input channels; generate a speech probability for each of the plurality of input channels by combining the feature-based speech probabilities of the input channel; and generate a combined speech probability for the audio signal frame using at least one of the speech probabilities of the plurality of input channels and an additive model for a top layer of a probabilistic layered network model. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for classifying an audio signal received at a noise suppression module via a plurality of input channels as speech or noise, the method comprising:
-
measuring, for each of the plurality of channels, signal classification features for a frame of the signal input from the channel; determining, for each of the measured signal classification features of each of the plurality of channels, a first classification state for the signal based on the measured signal classification feature; determining, for each of the plurality of channels, a second classification state for the signal by combining the first classification states of the channel using a probabilistic layered network model with an additive model as a top layer; and classifying the signal as speech or noise based on the second classification states of the plurality of channels.
-
Specification