Method and apparatus for increasing speech intelligibility in noisy environments
First Claim
1. A method of improving intelligibility of speech that is included in audio that is emitted into a noisy environment, the method comprising:
- determining if one or more voice formants are present in each ith audio segment of a plurality of audio segments;
if one or more formants are determined to be present in the ith audio segment;
selecting a perceptual frequency scale band (L) including at least one of the one or more formants from a plurality of perceptual frequency scale bands of a perceptual scale ambient noise spectrum of the noisy environment;
comparing, to a threshold, a signal-to-noise ratio of the perceptual frequency scale band, andif the signal-to-noise ratio is less than the threshold, increasing a formant enhancement gain for the perceptual frequency scale band;
computing a summed signal-to-noise ratio across at least a portion of the perceptual scale ambient noise spectrum wherein a plurality of speech magnitudes in each of the plurality of perceptual frequency scale bands are used as signal magnitudes;
scaling a set of overall gains that include at least the formant enhancement gains as a function of the summed signal-to-noise ratio;
smoothing the set of overall gains;
filtering the ith audio segment with the set of overall gains; and
outputting the ith audio segment into the noisy environment.
2 Assignments
0 Petitions
Accused Products
Abstract
A method (400, 500) and apparatus (220) seeks to improve the intelligibility of speech emitted into a noisy environment. Formants are identified (426) and perceptual frequency scale band is selected (502) that includes at least one of the identified formants. The SNR in each band is compared (504) to a threshold and, if the SNR for that band is less than the threshold, the method increases a formant enhancement gain for that band. A set of high pass filter gains (338) is combined (516) with the formant enhancement gains yielding combined gains that are then clipped (518), scaled (520) according to a total SNR, normalized (526), smoothed across time (530) and frequency (532), and used to reconstruct (532, 534) an audio signal.
-
Citations
22 Claims
-
1. A method of improving intelligibility of speech that is included in audio that is emitted into a noisy environment, the method comprising:
-
determining if one or more voice formants are present in each ith audio segment of a plurality of audio segments; if one or more formants are determined to be present in the ith audio segment; selecting a perceptual frequency scale band (L) including at least one of the one or more formants from a plurality of perceptual frequency scale bands of a perceptual scale ambient noise spectrum of the noisy environment; comparing, to a threshold, a signal-to-noise ratio of the perceptual frequency scale band, and if the signal-to-noise ratio is less than the threshold, increasing a formant enhancement gain for the perceptual frequency scale band; computing a summed signal-to-noise ratio across at least a portion of the perceptual scale ambient noise spectrum wherein a plurality of speech magnitudes in each of the plurality of perceptual frequency scale bands are used as signal magnitudes; scaling a set of overall gains that include at least the formant enhancement gains as a function of the summed signal-to-noise ratio; smoothing the set of overall gains; filtering the ith audio segment with the set of overall gains; and outputting the ith audio segment into the noisy environment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An audio apparatus adapted for outputting speech in a noisy environment, the audio apparatus comprising:
-
a speaker for outputting the speech; a microphone for receiving ambient noise from the noisy environment; a source of audio to be output into the noisy environment; a processor coupled to the source of audio, the speaker, and the microphone, wherein the processor is programmed to; determine if one or more voice formants are present in each ith audio segment of a plurality of audio segments; if one or more formants are determined to be present in the ith audio segment; select a perceptual frequency scale band (L) including at least one of the one or more formants from a plurality of perceptual frequency scale bands of a perceptual scale ambient noise spectrum of the noisy environment; compare, to a threshold, a signal-to-noise ratio of the perceptual frequency scale band, and if the signal-to-noise ratio is less than the threshold, increase a formant enhancement gain for the perceptual frequency scale band; compute a summed signal-to-noise ratio across at least a portion of the perceptual scale ambient noise spectrum wherein a plurality of speech magnitudes are used as signal magnitudes; scale a set of overall gains that include at least the formant enhancement gains as a function of the summed signal-to-noise ratio; smooth the set of overall gains; filter the ith audio segment with the set of overall gains; and output the ith audio segment into the noisy environment. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
Specification