Combined suppression of noise, echo, and out-of-location signals
First Claim
Patent Images
1. A system for processing audio input signals, comprising:
- an input processor to accept a plurality of sampled audio input signals to form a mixed-down signal in the sample or frequency domain, and further to form a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, at least 90% of the bands having contribution from two or more frequency bins;
a banded spatial feature estimator to estimate banded spatial features from the plurality of sampled input signals;
a gain calculator to calculate a set of banded suppression probability indicators including a banded out-of-location signal probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator, expressible for each frequency band as a noise suppression gain and determined using a banded estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals, the gain calculator further to combine the set of probability indicators to calculate a combined gain for each band of the plurality of frequency bands; and
a suppressor to apply an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, a method, logic embodied in a computer-readable medium, and a computer-readable medium comprising instructions that when executed carry out a method. The method processes: (a) a plurality of input signals, e.g., signals from a plurality of spatially separated microphones; and, for echo suppression, (b) one or more reference signals, e.g., signals from or to be rendered by one or more loudspeakers and that can cause echoes. The method processes the input signals and one or more reference signals to carry out in an integrated manner simultaneous noise suppression and out-of-location signal suppression, and in some versions, echo suppression.
-
Citations
96 Claims
-
1. A system for processing audio input signals, comprising:
-
an input processor to accept a plurality of sampled audio input signals to form a mixed-down signal in the sample or frequency domain, and further to form a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, at least 90% of the bands having contribution from two or more frequency bins; a banded spatial feature estimator to estimate banded spatial features from the plurality of sampled input signals; a gain calculator to calculate a set of banded suppression probability indicators including a banded out-of-location signal probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator, expressible for each frequency band as a noise suppression gain and determined using a banded estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals, the gain calculator further to combine the set of probability indicators to calculate a combined gain for each band of the plurality of frequency bands; and a suppressor to apply an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A method of operating a processing apparatus to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising:
-
accepting in the processing apparatus a plurality of sampled audio input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the input signals or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of banded suppression probability indicators, including a banded out-of-location suppression probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator expressible for each band as a noise suppression gain and determined using a banded estimate of noise spectral content determined based on the mixed-down banded instantaneous frequency domain amplitude metric of the mixed-down signal; combining the set of banded probability indicators to determine a combined gain for each band of the plurality of frequency bands; applying an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56)
-
-
57. A method of operating a processing apparatus to suppress undesired signals, the undesired signals including noise, the method comprising:
-
accepting in the processing apparatus at least one sampled input signals; forming a banded instantaneous frequency domain amplitude metric of the at least one input signal for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the at least one input signal or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;calculating a set of one or more suppression probability indicators, including a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the banded instantaneous frequency domain amplitude metric of the at least one input signal; combining the set of probability indicators to determine a banded combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on the frequency domain values of the at least one input signal or of a mixed down signal to form suppressed signal data, wherein the noise suppression probability indicator for each frequency band is expressible as noise suppression gain function of the banded instantaneous amplitude metric for the band, wherein for each frequency band, a first range of values of banded instantaneous amplitude metric values is expected for noise, and a second range of values of banded instantaneous amplitude metric values is expected for a desired input, and wherein the noise suppression gain functions for the frequency bands are configured to; have a respective minimum value; have a relatively constant value or a relatively small negative gradient in the first range; have a relatively constant gain in the second range; and have a smooth transition from the first range to the second range. - View Dependent Claims (58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79)
-
-
80. A method of operating a processing apparatus to suppress undesired signals, the method comprising:
-
accepting in the processing apparatus a plurality of sampled input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of suppression probability indicators, including an out-of-location suppression probability indicator determined using two or more of the spatial features, and a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals; accepting in the processing apparatus one or more reference signals; forming a banded frequency domain amplitude metric representation of the one or more reference signals; predicting a banded frequency domain amplitude metric representation of an echo using adaptively determined echo filter coefficients; determining a plurality of indications of voice activity from the mixed-down banded instantaneous frequency domain amplitude metric using respective instantiations of a universal voice activity detection method, the universal voice activity detection method being controlled by a set of parameters and using an estimate of noise spectral content, the banded frequency domain amplitude metric representation of the echo, and the banded spatial features;
wherein the set of parameters includes a parameter indicative of whether the estimate of noise spectral content is spatially selective or not;
wherein which indication of voice activity an instantiation determines is controlled by a selection of the parameters; andcombining the set of probability indicators to determine a combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on bin data of the mixed-down signal to form suppressed signal data, wherein different instantiations of the universal voice activity detection method are applied in different steps of the method.
-
-
81. A processing apparatus comprising:
-
one or more processors; and a computer-readable storage medium coupled to the one or more processors and comprising instructions to cause, when executed by at least one of the processors, the processing apparatus to carry out a method to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising; accepting in the processing apparatus a plurality of sampled audio input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the input signals or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of banded suppression probability indicators, including a banded out-of-location suppression probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator expressible for each band as a noise suppression gain and determined using a banded estimate of noise spectral content determined based on the mixed-down banded instantaneous frequency domain amplitude metric of the mixed-down signal; combining the set of banded probability indicators to determine a combined gain for each band of the plurality of frequency bands; applying an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data. - View Dependent Claims (82, 83, 84, 85)
-
-
86. A processing apparatus comprising:
-
one or more processors; and a computer-readable storage medium coupled to the one or more processors and comprising instructions to cause, when executed by at least one of the processors, the processing apparatus to carry out a method to suppress undesired signals, the undesired signals including noise, the method comprising; accepting in the processing apparatus at least one sampled input signals; forming a banded instantaneous frequency domain amplitude metric of the at least one input signal for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the at least one input signal or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;calculating a set of one or more suppression probability indicators, including a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the banded instantaneous frequency domain amplitude metric of the at least one input signal; combining the set of probability indicators to determine a banded combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on the frequency domain values of the at least one input signal or of a mixed down signal to form suppressed signal data, wherein the noise suppression probability indicator for each frequency band is expressible as noise suppression gain function of the banded instantaneous amplitude metric for the band, wherein for each frequency band, a first range of values of banded instantaneous amplitude metric values is expected for noise, and a second range of values of banded instantaneous amplitude metric values is expected for a desired input, and wherein the noise suppression gain functions for the frequency bands are configured to; have a respective minimum value; have a relatively constant value or a relatively small negative gradient in the first range; have a relatively constant gain in the second range; and have a smooth transition from the first range to the second range. - View Dependent Claims (87, 88, 89, 90)
-
-
91. A processing apparatus comprising:
-
one or more processors; and a computer-readable storage medium coupled to the one or more processors and comprising instructions to cause, when executed by at least one of the processors, the processing apparatus to carry out a method to suppress undesired signals, the method comprising; accepting in the processing apparatus a plurality of sampled input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of suppression probability indicators, including an out-of-location suppression probability indicator determined using two or more of the spatial features, and a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals; accepting in the processing apparatus one or more reference signals; forming a banded frequency domain amplitude metric representation of the one or more reference signals; predicting a banded frequency domain amplitude metric representation of an echo using adaptively determined echo filter coefficients; determining a plurality of indications of voice activity from the mixed-down banded instantaneous frequency domain amplitude metric using respective instantiations of a universal voice activity detection method, the universal voice activity detection method being controlled by a set of parameters and using an estimate of noise spectral content, the banded frequency domain amplitude metric representation of the echo, and the banded spatial features;
wherein the set of parameters includes a parameter indicative of whether the estimate of noise spectral content is spatially selective or not;
wherein which indication of voice activity an instantiation determines is controlled by a selection of the parameters; andcombining the set of probability indicators to determine a combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on bin data of the mixed-down signal to form suppressed signal data, wherein different instantiations of the universal voice activity detection method are applied in different steps of the method.
-
-
92. A non-transitory computer-readable medium comprising instructions to cause, when executed by at least one processor of a processing apparatus to carry out a method to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising:
-
accepting in the processing apparatus a plurality of sampled audio input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the input signals or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of banded suppression probability indicators, including a banded out-of-location suppression probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator expressible for each band as a noise suppression gain and determined using a banded estimate of noise spectral content determined based on the mixed-down banded instantaneous frequency domain amplitude metric of the mixed-down signal; combining the set of banded probability indicators to determine a combined gain for each band of the plurality of frequency bands; applying an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data. - View Dependent Claims (93)
-
-
94. A non-transitory computer-readable medium comprising instructions to cause, when executed by at least one processor of a processing apparatus to carry out a method to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising:
-
accepting in the processing apparatus at least one sampled input signals; forming a banded instantaneous frequency domain amplitude metric of the at least one input signal for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the at least one input signal or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;calculating a set of one or more suppression probability indicators, including a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the banded instantaneous frequency domain amplitude metric of the at least one input signal; combining the set of probability indicators to determine a banded combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on the frequency domain values of the at least one input signal or of a mixed down signal to form suppressed signal data, wherein the noise suppression probability indicator for each frequency band is expressible as noise suppression gain function of the banded instantaneous amplitude metric for the band, wherein for each frequency band, a first range of values of banded instantaneous amplitude metric values is expected for noise, and a second range of values of banded instantaneous amplitude metric values is expected for a desired input, and wherein the noise suppression gain functions for the frequency bands are configured to; have a respective minimum value; have a relatively constant value or a relatively small negative gradient in the first range; have a relatively constant gain in the second range; and have a smooth transition from the first range to the second range. - View Dependent Claims (95)
-
-
96. A non-transitory computer-readable medium comprising instructions that cause, when executed by at least one processor of a processing apparatus, to carry out a method to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising:
-
accepting in the processing apparatus a plurality of sampled input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of suppression probability indicators, including an out-of-location suppression probability indicator determined using two or more of the spatial features, and a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals; accepting in the processing apparatus one or more reference signals; forming a banded frequency domain amplitude metric representation of the one or more reference signals; predicting a banded frequency domain amplitude metric representation of an echo using adaptively determined echo filter coefficients; determining a plurality of indications of voice activity from the mixed-down banded instantaneous frequency domain amplitude metric using respective instantiations of a universal voice activity detection method, the universal voice activity detection method being controlled by a set of parameters and using an estimate of noise spectral content, the banded frequency domain amplitude metric representation of the echo, and the banded spatial features;
wherein the set of parameters includes a parameter indicative of whether the estimate of noise spectral content is spatially selective or not;
wherein which indication of voice activity an instantiation determines is controlled by a selection of the parameters; andcombining the set of probability indicators to determine a combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on bin data of the mixed-down signal to form suppressed signal data, wherein different instantiations of the universal voice activity detection method are applied in different steps of the method.
-
Specification