Method and apparatus of increasing speech intelligibility in noisy environments
First Claim
1. A method of improving the intelligibility of speech that is included in audio that is emitted into a noisy environment, the method comprising:
- for each ith audio segment of a plurality of audio segments;
analyzing ambient noise in said noisy environment to produce a noise spectrum on a perceptual scale that comprises a plurality of frequency bands, wherein said noise spectrum includes a plurality noise magnitudes including a noise magnitude in each of said plurality of frequency bands;
analyzing audio, that is to be emitted into the noisy environment and that includes speech, to produce a speech spectrum on said perceptual scale, wherein said speech spectrum comprises a plurality of speech magnitudes including a speech magnitude in each of said plurality of frequency bands;
for each of said plurality of frequency bands, computing a signal-to-noise ratio wherein said plurality of speech magnitudes are used as signal magnitudes;
determining if one or more formants are present in each ith audio segment;
if one or more formants are determined to be present in said ith audio segment;
comparing the signal-to-noise ratio in each frequency band that includes one of the one or more formants to a threshold, and if said signal-to-noise ratio is less than said threshold increasing a formant enhancement gain for said frequency band that includes said one of the one or more formants;
computing a summed signal-to-noise ratio across at least a portion of said perceptual scale wherein said plurality of speech magnitudes are used as signal magnitudes;
scaling a set of overall gains that include at least said formant enhancement gains as a function of said summed signal-to-noise ratio;
smoothing said set of overall gains;
filtering said ith audio segment with said overall gains; and
outputting said ith audio segment into said noisy environment.
4 Assignments
0 Petitions
Accused Products
Abstract
A method (400, 600, 700) and apparatus (220) for enhancing the intelligibility of speech emitted into a noisy environment. After filtering (408) ambient noise with a filter (304) that simulates the physical blocking of noise by a at least a part of a voice communication device (102) a frequency dependent SNR of received voice audio relative to ambient noise is computed (424) on a perceptual (e.g. Bark) frequency scale. Formants are identified (426, 600, 700) and the SNR in bands including certain formants are modified (508, 510) with formant enhancement gain factors in order to improve intelligibility. A set of high pass filter gains (338) is combined (516) with the formant enhancement gains factors yielding combined gains which are clipped (518), scaled (520) according to a total SNR, normalized (526), smoothed across time (530) and frequency (532) and used to reconstruct (532, 534) an audio signal.
135 Citations
30 Claims
-
1. A method of improving the intelligibility of speech that is included in audio that is emitted into a noisy environment, the method comprising:
for each ith audio segment of a plurality of audio segments;
analyzing ambient noise in said noisy environment to produce a noise spectrum on a perceptual scale that comprises a plurality of frequency bands, wherein said noise spectrum includes a plurality noise magnitudes including a noise magnitude in each of said plurality of frequency bands;
analyzing audio, that is to be emitted into the noisy environment and that includes speech, to produce a speech spectrum on said perceptual scale, wherein said speech spectrum comprises a plurality of speech magnitudes including a speech magnitude in each of said plurality of frequency bands;
for each of said plurality of frequency bands, computing a signal-to-noise ratio wherein said plurality of speech magnitudes are used as signal magnitudes;
determining if one or more formants are present in each ith audio segment;
if one or more formants are determined to be present in said ith audio segment;
comparing the signal-to-noise ratio in each frequency band that includes one of the one or more formants to a threshold, and if said signal-to-noise ratio is less than said threshold increasing a formant enhancement gain for said frequency band that includes said one of the one or more formants;
computing a summed signal-to-noise ratio across at least a portion of said perceptual scale wherein said plurality of speech magnitudes are used as signal magnitudes;
scaling a set of overall gains that include at least said formant enhancement gains as a function of said summed signal-to-noise ratio;
smoothing said set of overall gains;
filtering said ith audio segment with said overall gains; and
outputting said ith audio segment into said noisy environment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. An audio apparatus adapted for outputting speech with enhanced intelligibility in a noisy environment, the apparatus comprising:
-
a speaker for outputting said speech;
a microphone for inputting noise in said noisy environment;
a source of audio to be output into said noisy environment;
a processor coupled to said source of audio, said speaker and said microphone, wherein said microprocessor is programmed to;
for each ith audio segment of a plurality of audio segments;
analyze ambient noise in said noisy environment to produce a noise spectrum on a perceptual scale that comprises a plurality of frequency bands, wherein said noise spectrum includes a plurality noise magnitudes including a noise magnitude in each of said plurality of frequency bands;
analyze audio, that is to be emitted into the noisy environment and that includes speech, to produce a speech spectrum on said perceptual scale, wherein said speech spectrum comprises a plurality of speech magnitudes including a speech magnitude in each of said plurality of frequency bands;
for each of said plurality of frequency bands, compute a signal-to-noise ratio wherein said plurality of speech magnitudes are used as signal magnitudes;
determine if one or more formants are present in each ith audio segment;
if one or more formants are determined to be present in said ith audio segment;
compare the signal-to-noise ratio in each frequency band that includes one of the one or more formants to a threshold, and if said signal-to-noise ratio is less than said threshold increasing a formant enhancement gain for said frequency band that includes said one of the one or more formants;
compute a summed signal-to-noise ratio across at least a portion of said perceptual scale wherein said plurality of speech magnitudes are used as signal magnitudes;
scale a set of overall gains that include at least said formant enhancement gains as a function of said summed signal-to-noise ratio;
smooth said set of overall gains;
filter said ith audio segment with said overall gains; and
output said ith audio segment into said noisy environment. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification