COMBINED SUPPRESSION OF NOISE, ECHO, AND OUT-OF-LOCATION SIGNALS
First Claim
1. A system for processing audio input signals, comprising:
- an input processor to accept a plurality of sampled audio input signals to form a mixed-down signal in the sample or frequency domain, and further to form a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, at least 90% of the bands having contribution from two or more frequency bins;
a banded spatial feature estimator to estimate banded spatial features from the plurality of sampled input signals;
a gain calculator to calculate a set of banded suppression probability indicators including a banded out-of-location signal probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator, expressible for each frequency band as a noise suppression gain and determined using a banded estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals, the gain calculator further to combine the set of probability indicators to calculate a combined gain for each band of the plurality of frequency bands; and
a suppressor to apply an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, a method, logic embodied in a computer-readable medium, and a computer-readable medium comprising instructions that when executed carry out a method. The method processes: (a) a plurality of input signals, e.g., signals from a plurality of spatially separated microphones; and, for echo suppression, (b) one or more reference signals, e.g., signals from or to be rendered by one or more loudspeakers and that can cause echoes. The method processes the input signals and one or more reference signals to carry out in an integrated manner simultaneous noise suppression and out-of-location signal suppression, and in some versions, echo suppression.
104 Citations
96 Claims
-
1. A system for processing audio input signals, comprising:
-
an input processor to accept a plurality of sampled audio input signals to form a mixed-down signal in the sample or frequency domain, and further to form a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, at least 90% of the bands having contribution from two or more frequency bins; a banded spatial feature estimator to estimate banded spatial features from the plurality of sampled input signals; a gain calculator to calculate a set of banded suppression probability indicators including a banded out-of-location signal probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator, expressible for each frequency band as a noise suppression gain and determined using a banded estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals, the gain calculator further to combine the set of probability indicators to calculate a combined gain for each band of the plurality of frequency bands; and a suppressor to apply an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A method of operating a processing apparatus to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising:
-
accepting in the processing apparatus a plurality of sampled audio input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the input signals or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of banded suppression probability indicators, including a banded out-of-location suppression probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator expressible for each band as a noise suppression gain and determined using a banded estimate of noise spectral content determined based on the mixed-down banded instantaneous frequency domain amplitude metric of the mixed-down signal; combining the set of banded probability indicators to determine a combined gain for each band of the plurality of frequency bands; applying an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56)
-
-
57. A method of operating a processing apparatus to suppress undesired signals, the undesired signals including noise, the method comprising:
-
accepting in the processing apparatus at least one sampled input signals; forming a banded instantaneous frequency domain amplitude metric of the at least one input signal for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the at least one input signal or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;calculating a set of one or more suppression probability indicators, including a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the banded instantaneous frequency domain amplitude metric of the at least one input signal; combining the set of probability indicators to determine a banded combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on the frequency domain values of the at least one input signal or of a mixed down signal to form suppressed signal data, wherein the noise suppression probability indicator for each frequency band is expressible as noise suppression gain function of the banded instantaneous amplitude metric for the band, wherein for each frequency band, a first range of values of banded instantaneous amplitude metric values is expected for noise, and a second range of values of banded instantaneous amplitude metric values is expected for a desired input, and wherein the noise suppression gain functions for the frequency bands are configured to; have a respective minimum value; have a relatively constant value or a relatively small negative gradient in the first range; have a relatively constant gain in the second range; and have a smooth transition from the first range to the second range. - View Dependent Claims (58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79)
wherein the set of suppression probability indicators includes an out-of-location suppression probability indicator determined using two or more of the spatial features, such that the method simultaneously suppresses noise and out-of-location signals.
-
-
65. A method as recited in claim 64, wherein the estimate of noise spectral content is a spatially-selective estimate of noise spectral content determined using two or more of the banded spatial features.
-
66. A method as recited in claim 57, further comprising:
-
accepting one or more reference signals; forming a banded frequency domain amplitude metric representation of the one or more reference signals; and predicting a banded frequency domain amplitude metric representation of an echo using adaptively determined echo filter coefficients, the filter coefficients determined using an estimate of the banded spectral amplitude metric of the noise (124), previously predicted echo spectral content, and an estimate of the banded spectral amplitude metric of the input signals, the filter coefficients updated based on the estimates of the banded spectral amplitude metric of the input signals and of the noise, and the previously predicted echo spectral content, wherein the final gain incorporates at least one banded suppression probability indicator that includes echo suppression, the at least one banded suppression probability indicator determined using the banded frequency domain amplitude metric representation of the echo.
-
-
67. A method as recited in claim 66,
wherein determining the coefficients includes voice-activity detecting, and wherein the updating depends on the results of the voice-activity detecting. -
68. A method as recited in claim 66, wherein the predicting includes time smoothing the results of the filtering.
-
69. A method as recited in claim 66, wherein the estimate of the banded spectral frequency domain amplitude metric of the noise used by the coefficient updater is determined by a leaky minimum follower with a tracking rate defined by at least one minimum follower leak rate parameter.
-
70. A method as recited in claim 69, wherein the minimum follower is gated by the presence of an echo estimate comparable to or greater than a previous estimate of the banded spectral frequency domain amplitude metric of the noise.
-
71. A method as recited in claim 69, wherein the at least one leak rate parameter of the leaky minimum follower are controlled by the probability of voice being present as determined by voice activity detecting.
-
72. A method as recited in claim 66, further comprising:
calculating an additional echo suppression gain and combining with one or more other determined suppression gains to generate the final gain.
-
73. A method as recited in claim 72, wherein the combining with the one or more other determined suppression gains is to form the first combined gain of the bands.
-
74. A method as recited in claim 73, wherein the method further comprises carrying out post-processing on the first combined gain of the bands to generate a first post-processed gain, and combining the first post-processed gain with the additional echo suppression gain to form the final gain.
-
75. A method as recited in claim 57, wherein the banding is such that the frequency spacing of the bands is non monotonically decreasing, and such that 90% or more of the bands have contribution from more than one frequency bin.
-
76. A method as recited in claim 75, wherein the spacing of the bands is log-like.
-
77. A method as recited in claim 57, further comprising applying output synthesis to generate output samples.
-
78. A method as recited in claim 57, further comprising:
- applying output remapping to generate output frequency bins.
-
79. A method as recited in claim 57, wherein the frequency domain amplitude metric is the frequency domain power.
-
80. A method of operating a processing apparatus to suppress undesired signals, the method comprising:
-
accepting in the processing apparatus a plurality of sampled input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of suppression probability indicators, including an out-of-location suppression probability indicator determined using two or more of the spatial features, and a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals; accepting in the processing apparatus one or more reference signals; forming a banded frequency domain amplitude metric representation of the one or more reference signals; predicting a banded frequency domain amplitude metric representation of an echo using adaptively determined echo filter coefficients; determining a plurality of indications of voice activity from the mixed-down banded instantaneous frequency domain amplitude metric using respective instantiations of a universal voice activity detection method, the universal voice activity detection method being controlled by a set of parameters and using an estimate of noise spectral content, the banded frequency domain amplitude metric representation of the echo, and the banded spatial features;
wherein the set of parameters includes a parameter indicative of whether the estimate of noise spectral content is spatially selective or not;
wherein which indication of voice activity an instantiation determines is controlled by a selection of the parameters; andcombining the set of probability indicators to determine a combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on bin data of the mixed-down signal to form suppressed signal data, wherein different instantiations of the universal voice activity detection method are applied in different steps of the method.
-
-
81. A processing apparatus comprising:
-
one or more processors; and a computer-readable storage medium coupled to the one or more processors and comprising instructions to cause, when executed by at least one of the processors, the processing apparatus to carry out a method to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising; accepting in the processing apparatus a plurality of sampled audio input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the input signals or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of banded suppression probability indicators, including a banded out-of-location suppression probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator expressible for each band as a noise suppression gain and determined using a banded estimate of noise spectral content determined based on the mixed-down banded instantaneous frequency domain amplitude metric of the mixed-down signal; combining the set of banded probability indicators to determine a combined gain for each band of the plurality of frequency bands; applying an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data. - View Dependent Claims (82, 83, 84, 85)
-
-
86. A processing apparatus comprising:
-
one or more processors; and a computer-readable storage medium coupled to the one or more processors and comprising instructions to cause, when executed by at least one of the processors, the processing apparatus to carry out a method to suppress undesired signals, the undesired signals including noise, the method comprising; accepting in the processing apparatus at least one sampled input signals; forming a banded instantaneous frequency domain amplitude metric of the at least one input signal for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the at least one input signal or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;calculating a set of one or more suppression probability indicators, including a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the banded instantaneous frequency domain amplitude metric of the at least one input signal; combining the set of probability indicators to determine a banded combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on the frequency domain values of the at least one input signal or of a mixed down signal to form suppressed signal data, wherein the noise suppression probability indicator for each frequency band is expressible as noise suppression gain function of the banded instantaneous amplitude metric for the band, wherein for each frequency band, a first range of values of banded instantaneous amplitude metric values is expected for noise, and a second range of values of banded instantaneous amplitude metric values is expected for a desired input, and wherein the noise suppression gain functions for the frequency bands are configured to; have a respective minimum value; have a relatively constant value or a relatively small negative gradient in the first range; have a relatively constant gain in the second range; and have a smooth transition from the first range to the second range. - View Dependent Claims (87, 88, 89, 90)
-
-
89. A processing apparatus as recited in claim 86,
wherein the accepting in the processing apparatus is of a plurality of sampled input signals, wherein the forming of the banded instantaneous frequency domain amplitude metric of the accepted input signals forms a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, wherein the method further comprises determining banded spatial features from the plurality of sampled input signals; - and
wherein the set of suppression probability indicators includes an out-of-location suppression probability indicator determined using two or more of the spatial features, such that the method simultaneously suppresses noise and out-of-location signals.
- and
-
90. A processing apparatus as recited in claim 86, wherein the method further comprises:
-
accepting one or more reference signals; forming a banded frequency domain amplitude metric representation of the one or more reference signals; and predicting a banded frequency domain amplitude metric representation of an echo using adaptively determined echo filter coefficients, the filter coefficients determined using an estimate of the banded spectral amplitude metric of the noise, previously predicted echo spectral content, and an estimate of the banded spectral amplitude metric of the input signals (, the filter coefficients updated based on the estimates of the banded spectral amplitude metric of the input signals and of the noise, and the previously predicted echo spectral content, wherein the final gain incorporates at least one banded suppression probability indicator that includes echo suppression, the at least one banded suppression probability indicator determined using the banded frequency domain amplitude metric representation of the echo.
-
-
91. A processing apparatus comprising:
-
one or more processors; and a computer-readable storage medium coupled to the one or more processors and comprising instructions to cause, when executed by at least one of the processors, the processing apparatus to carry out a method to suppress undesired signals, the method comprising; accepting in the processing apparatus a plurality of sampled input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of suppression probability indicators, including an out-of-location suppression probability indicator determined using two or more of the spatial features, and a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals; accepting in the processing apparatus one or more reference signals; forming a banded frequency domain amplitude metric representation of the one or more reference signals; predicting a banded frequency domain amplitude metric representation of an echo using adaptively determined echo filter coefficients; determining a plurality of indications of voice activity from the mixed-down banded instantaneous frequency domain amplitude metric using respective instantiations of a universal voice activity detection method, the universal voice activity detection method being controlled by a set of parameters and using an estimate of noise spectral content, the banded frequency domain amplitude metric representation of the echo, and the banded spatial features;
wherein the set of parameters includes a parameter indicative of whether the estimate of noise spectral content is spatially selective or not;
wherein which indication of voice activity an instantiation determines is controlled by a selection of the parameters; andcombining the set of probability indicators to determine a combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on bin data of the mixed-down signal to form suppressed signal data, wherein different instantiations of the universal voice activity detection method are applied in different steps of the method.
-
-
92. A non-transitory computer-readable medium comprising instructions to cause, when executed by at least one processor of a processing apparatus to carry out a method to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising:
-
accepting in the processing apparatus a plurality of sampled audio input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the input signals or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of banded suppression probability indicators, including a banded out-of-location suppression probability indicator determined using two or more of the banded spatial features, and a banded noise suppression probability indicator expressible for each band as a noise suppression gain and determined using a banded estimate of noise spectral content determined based on the mixed-down banded instantaneous frequency domain amplitude metric of the mixed-down signal; combining the set of banded probability indicators to determine a combined gain for each band of the plurality of frequency bands; applying an interpolated final gain determined from the combined gains of the plurality of frequency bands to carry out suppression on the mixed-down signal to form suppressed signal data. - View Dependent Claims (93)
-
-
94. A non-transitory computer-readable medium comprising instructions to cause, when executed by at least one processor of a processing apparatus to carry out a method to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising:
-
accepting in the processing apparatus at least one sampled input signals; forming a banded instantaneous frequency domain amplitude metric of the at least one input signal for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values of the at least one input signal or of a mixed down signal for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;calculating a set of one or more suppression probability indicators, including a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the banded instantaneous frequency domain amplitude metric of the at least one input signal; combining the set of probability indicators to determine a banded combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on the frequency domain values of the at least one input signal or of a mixed down signal to form suppressed signal data, wherein the noise suppression probability indicator for each frequency band is expressible as noise suppression gain function of the banded instantaneous amplitude metric for the band, wherein for each frequency band, a first range of values of banded instantaneous amplitude metric values is expected for noise, and a second range of values of banded instantaneous amplitude metric values is expected for a desired input, and wherein the noise suppression gain functions for the frequency bands are configured to; have a respective minimum value; have a relatively constant value or a relatively small negative gradient in the first range; have a relatively constant gain in the second range; and have a smooth transition from the first range to the second range. - View Dependent Claims (95)
wherein the final gain incorporates at least one banded suppression probability indicator that includes echo suppression, the at least one banded suppression probability indicator determined using the banded frequency domain amplitude metric representation of the echo.
-
-
96. A non-transitory computer-readable medium comprising instructions that cause, when executed by at least one processor of a processing apparatus, to carry out a method to suppress undesired signals including noise and out-of-location signals in audio input signals, the method comprising:
-
accepting in the processing apparatus a plurality of sampled input signals; forming a mixed-down banded instantaneous frequency domain amplitude metric of the input signals for a plurality of frequency bands, the forming including transforming into complex-valued frequency domain values for a set of frequency bins;
at least 90% of the bands having contribution from two or more frequency bins;determining banded spatial features from the plurality of sampled input signals; calculating a set of suppression probability indicators, including an out-of-location suppression probability indicator determined using two or more of the spatial features, and a noise suppression probability indicator expressible for each frequency band as a noise suppression gain and determined using an estimate of noise spectral content based on the mixed-down banded instantaneous frequency domain amplitude metric of the input signals; accepting in the processing apparatus one or more reference signals; forming a banded frequency domain amplitude metric representation of the one or more reference signals; predicting a banded frequency domain amplitude metric representation of an echo using adaptively determined echo filter coefficients; determining a plurality of indications of voice activity from the mixed-down banded instantaneous frequency domain amplitude metric using respective instantiations of a universal voice activity detection method, the universal voice activity detection method being controlled by a set of parameters and using an estimate of noise spectral content, the banded frequency domain amplitude metric representation of the echo, and the banded spatial features;
wherein the set of parameters includes a parameter indicative of whether the estimate of noise spectral content is spatially selective or not;
wherein which indication of voice activity an instantiation determines is controlled by a selection of the parameters; andcombining the set of probability indicators to determine a combined gain for each band; applying an interpolated final gain determined from the combined gain to carry out suppression on bin data of the mixed-down signal to form suppressed signal data, wherein different instantiations of the universal voice activity detection method are applied in different steps of the method.
-
Specification