Monaural noise suppression based on computational auditory scene analysis
First Claim
1. A method for performing noise reduction, the method comprising:
- executing a program stored in a memory to transform a time-domain acoustic signal into a plurality of frequency-domain sub-band signals;
tracking at least one pitch from a plurality of pitch sources within a frequency-domain sub-band signal in the plurality of frequency-domain sub-band signals, wherein the tracking includes;
calculating at least one feature for each of the plurality of pitch sources; and
determining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker;
generating a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech; and
performing the noise reduction on the frequency-domain sub-band signal based on the speech model and the one or more noise models.
4 Assignments
0 Petitions
Accused Products
Abstract
The present technology provides a robust noise suppression system that may concurrently reduce noise and echo components in an acoustic signal while limiting the level of speech distortion. A time-domain acoustic signal may be received and be transformed to frequency-domain sub-band signals. Features, such as pitch, may be identified and tracked within the sub-band signals. Initial speech and noise models may be then be estimated at least in part from a probability analysis based on the tracked pitch sources. Speech and noise models may be resolved from the initial speech and noise models and noise reduction may be performed on the sub-band signals. An acoustic signal may be reconstructed from the noise-reduced sub-band signals.
-
Citations
16 Claims
-
1. A method for performing noise reduction, the method comprising:
-
executing a program stored in a memory to transform a time-domain acoustic signal into a plurality of frequency-domain sub-band signals; tracking at least one pitch from a plurality of pitch sources within a frequency-domain sub-band signal in the plurality of frequency-domain sub-band signals, wherein the tracking includes; calculating at least one feature for each of the plurality of pitch sources; and determining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker; generating a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech; and performing the noise reduction on the frequency-domain sub-band signal based on the speech model and the one or more noise models. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for performing noise reduction in an audio signal, the system comprising:
-
a memory; an analysis module stored in the memory and executed by a processor to transform a time-domain acoustic to frequency-domain sub-band signals; a source inference engine stored in the memory and executed by the processor to track at least one pitch from a plurality of pitch sources within the frequency-domain sub-band signals and to generate a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech, wherein the tracking includes; calculating at least one feature for each of the plurality of pitch sources; and determining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker; and a modifier module stored in the memory and executed by the processor to perform the noise reduction on the frequency-domain sub-band signals based on the speech model and the one or more noise models. - View Dependent Claims (10, 11, 12)
-
-
13. A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for reducing noise in an audio signal, the method comprising:
-
transforming an acoustic signal from a time-domain signal to frequency-domain sub-band signals; tracking at least one pitch from a plurality of pitch sources within the frequency-domain sub-band signals, the tracking including; calculating at least one feature for each of the plurality of pitch sources; and determining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker; generating a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech; and performing noise reduction on the frequency-domain sub-band signals based on the speech model and one or more noise models. - View Dependent Claims (14, 15, 16)
-
Specification