Method and apparatus of increasing speech intelligibility in noisy environments
First Claim
1. A method of improving the intelligibility of speech that is included in audio that is emitted by a voice communication device into a noisy environment, the method comprising:
- for each ith audio segment of a plurality of audio segments received from a remote terminal that is to be emitted by the voice communication device into the noisy environment and that includes speech;
analyzing ambient noise in said noisy environment using an intelligibility enhancer to produce a noise spectrum on a perceptual scale that comprises a plurality of frequency bands, wherein said noise spectrum includes a plurality noise magnitudes including a noise magnitude in each of said plurality of frequency bands;
analyzing said ith audio segment to produce a speech spectrum on said perceptual scale, wherein said speech spectrum comprises a plurality of speech magnitudes including a speech magnitude in each of said plurality of frequency bands;
for each of said plurality of frequency bands, computing a signal-to-noise ratio from said plurality of speech magnitudes used as signal magnitudes and said plurality of noise magnitudes;
determining if one or more formants are present in said ith audio segment;
if one or more formants are determined to be present in said ith audio segment;
comparing the signal-to-noise ratio in each frequency band that includes one of the one or more formants to a threshold, and if said signal-to-noise ratio is less than said threshold increasing a formant enhancement gain for said frequency band that includes said one of the one or more formants;
computing a summed signal-to-noise ratio across at least a portion of said perceptual scale wherein said plurality of speech magnitudes are used as signal magnitudes;
scaling a set of overall gains that include at least said formant enhancement gains as a function of said summed signal-to-noise ratio;
smoothing said set of overall gains;
filtering said ith audio segment with said overall gains; and
outputting said ith audio segment into said noisy environment by the voice communication device.
4 Assignments
0 Petitions
Accused Products
Abstract
A method (400, 600, 700) and apparatus (220) for enhancing the intelligibility of speech emitted into a noisy environment. After filtering (408) ambient noise with a filter (304) that simulates the physical blocking of noise by a at least a part of a voice communication device (102) a frequency dependent SNR of received voice audio relative to ambient noise is computed (424) on a perceptual (e.g. Bark) frequency scale. Formants are identified (426, 600, 700) and the SNR in bands including certain formants are modified (508, 510) with formant enhancement gain factors in order to improve intelligibility. A set of high pass filter gains (338) is combined (516) with the formant enhancement gains factors yielding combined gains which are clipped (518), scaled (520) according to a total SNR, normalized (526), smoothed across time (530) and frequency (532) and used to reconstruct (532, 534) an audio signal.
77 Citations
30 Claims
-
1. A method of improving the intelligibility of speech that is included in audio that is emitted by a voice communication device into a noisy environment, the method comprising:
for each ith audio segment of a plurality of audio segments received from a remote terminal that is to be emitted by the voice communication device into the noisy environment and that includes speech; analyzing ambient noise in said noisy environment using an intelligibility enhancer to produce a noise spectrum on a perceptual scale that comprises a plurality of frequency bands, wherein said noise spectrum includes a plurality noise magnitudes including a noise magnitude in each of said plurality of frequency bands; analyzing said ith audio segment to produce a speech spectrum on said perceptual scale, wherein said speech spectrum comprises a plurality of speech magnitudes including a speech magnitude in each of said plurality of frequency bands; for each of said plurality of frequency bands, computing a signal-to-noise ratio from said plurality of speech magnitudes used as signal magnitudes and said plurality of noise magnitudes; determining if one or more formants are present in said ith audio segment; if one or more formants are determined to be present in said ith audio segment; comparing the signal-to-noise ratio in each frequency band that includes one of the one or more formants to a threshold, and if said signal-to-noise ratio is less than said threshold increasing a formant enhancement gain for said frequency band that includes said one of the one or more formants; computing a summed signal-to-noise ratio across at least a portion of said perceptual scale wherein said plurality of speech magnitudes are used as signal magnitudes; scaling a set of overall gains that include at least said formant enhancement gains as a function of said summed signal-to-noise ratio; smoothing said set of overall gains; filtering said ith audio segment with said overall gains; and outputting said ith audio segment into said noisy environment by the voice communication device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. An audio apparatus adapted for outputting speech with enhanced intelligibility in a noisy environment, the apparatus comprising:
-
a speaker for outputting said speech; a microphone for inputting noise from said noisy environment; a source of audio that is received from a remote terminal and to be output into said noisy environment; a processor coupled to said source of audio, said speaker and said microphone, wherein said microprocessor is programmed to; for each ith audio segment of a plurality of audio segments from the source of audio; analyze ambient noise from said noisy environment to produce a noise spectrum on a perceptual scale that comprises a plurality of frequency bands, wherein said noise spectrum includes a plurality noise magnitudes including a noise magnitude in each of said plurality of frequency bands; analyze said ith audio segment to produce a speech spectrum on said perceptual scale, wherein said speech spectrum comprises a plurality of speech magnitudes including a speech magnitude in each of said plurality of frequency bands; for each of said plurality of frequency bands, compute a signal-to-noise ratio from said plurality of speech magnitudes used as signal magnitudes and said plurality of noise magnitudes; determine if one or more formants are present in said ith audio segment; if one or more formants are determined to be present in said ith audio segment; compare the signal-to-noise ratio in each frequency band that includes one of the one or more formants to a threshold, and if said signal-to-noise ratio is less than said threshold increasing a formant enhancement gain for said frequency band that includes said one of the one or more formants; compute a summed signal-to-noise ratio across at least a portion of said perceptual scale wherein said plurality of speech magnitudes are used as signal magnitudes; scale a set of overall gains that include at least said formant enhancement gains as a function of said summed signal-to-noise ratio; smooth said set of overall gains; filter said ith audio segment with said overall gains; and output said ith audio segment into said noisy environment. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification