Method and apparatus of increasing speech intelligibility in noisy environments

US 8,280,730 B2
Filed: 05/25/2005
Issued: 10/02/2012
Est. Priority Date: 05/25/2005
Status: Active Grant

First Claim

Patent Images

1. A method of improving the intelligibility of speech that is included in audio that is emitted by a voice communication device into a noisy environment, the method comprising:

for each i^thaudio segment of a plurality of audio segments received from a remote terminal that is to be emitted by the voice communication device into the noisy environment and that includes speech;

analyzing ambient noise in said noisy environment using an intelligibility enhancer to produce a noise spectrum on a perceptual scale that comprises a plurality of frequency bands, wherein said noise spectrum includes a plurality noise magnitudes including a noise magnitude in each of said plurality of frequency bands;

analyzing said i^thaudio segment to produce a speech spectrum on said perceptual scale, wherein said speech spectrum comprises a plurality of speech magnitudes including a speech magnitude in each of said plurality of frequency bands;

for each of said plurality of frequency bands, computing a signal-to-noise ratio from said plurality of speech magnitudes used as signal magnitudes and said plurality of noise magnitudes;

determining if one or more formants are present in said i^thaudio segment;

if one or more formants are determined to be present in said i^thaudio segment;

comparing the signal-to-noise ratio in each frequency band that includes one of the one or more formants to a threshold, and if said signal-to-noise ratio is less than said threshold increasing a formant enhancement gain for said frequency band that includes said one of the one or more formants;

computing a summed signal-to-noise ratio across at least a portion of said perceptual scale wherein said plurality of speech magnitudes are used as signal magnitudes;

scaling a set of overall gains that include at least said formant enhancement gains as a function of said summed signal-to-noise ratio;

smoothing said set of overall gains;

filtering said i^thaudio segment with said overall gains; and

outputting said i^thaudio segment into said noisy environment by the voice communication device.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method (400, 600, 700) and apparatus (220) for enhancing the intelligibility of speech emitted into a noisy environment. After filtering (408) ambient noise with a filter (304) that simulates the physical blocking of noise by a at least a part of a voice communication device (102) a frequency dependent SNR of received voice audio relative to ambient noise is computed (424) on a perceptual (e.g. Bark) frequency scale. Formants are identified (426, 600, 700) and the SNR in bands including certain formants are modified (508, 510) with formant enhancement gain factors in order to improve intelligibility. A set of high pass filter gains (338) is combined (516) with the formant enhancement gains factors yielding combined gains which are clipped (518), scaled (520) according to a total SNR, normalized (526), smoothed across time (530) and frequency (532) and used to reconstruct (532, 534) an audio signal.

77 Citations

View as Search Results

30 Claims

1. A method of improving the intelligibility of speech that is included in audio that is emitted by a voice communication device into a noisy environment, the method comprising:
- for each i^thaudio segment of a plurality of audio segments received from a remote terminal that is to be emitted by the voice communication device into the noisy environment and that includes speech;
  
  analyzing ambient noise in said noisy environment using an intelligibility enhancer to produce a noise spectrum on a perceptual scale that comprises a plurality of frequency bands, wherein said noise spectrum includes a plurality noise magnitudes including a noise magnitude in each of said plurality of frequency bands;
  
  analyzing said i^thaudio segment to produce a speech spectrum on said perceptual scale, wherein said speech spectrum comprises a plurality of speech magnitudes including a speech magnitude in each of said plurality of frequency bands;
  
  for each of said plurality of frequency bands, computing a signal-to-noise ratio from said plurality of speech magnitudes used as signal magnitudes and said plurality of noise magnitudes;
  
  determining if one or more formants are present in said i^thaudio segment;
  
  if one or more formants are determined to be present in said i^thaudio segment;
  
  comparing the signal-to-noise ratio in each frequency band that includes one of the one or more formants to a threshold, and if said signal-to-noise ratio is less than said threshold increasing a formant enhancement gain for said frequency band that includes said one of the one or more formants;
  
  computing a summed signal-to-noise ratio across at least a portion of said perceptual scale wherein said plurality of speech magnitudes are used as signal magnitudes;
  
  scaling a set of overall gains that include at least said formant enhancement gains as a function of said summed signal-to-noise ratio;
  
  smoothing said set of overall gains;
  
  filtering said i^thaudio segment with said overall gains; and
  
  outputting said i^thaudio segment into said noisy environment by the voice communication device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method according to claim 1 further comprising:
    - reading a plurality of high pass filter gains that include a high pass filter gain for each of said plurality of frequency bands of said perceptual scale, using the intelligibility enhancer;
      
      for each frequency band that includes one of the one or more formants, combining said formant enhancement gain with said high pass filter gain for said frequency band to produce a combined gain for said frequency band, whereby, said set of overall gains including an overall gain for each of said frequency bands of said perceptual scale is obtained.
  - 3. The method according to claim 1 wherein analyzing ambient noise in said noisy environment using the intelligibility enhancer to produce a noise spectrum on a perceptual scale comprises:
    - analyzing ambient noise in said noisy environment to produce a noise spectrum on the Bark scale.
  - 4. The method according to claim 1 wherein determining if one or more formants are present in said i^thaudio segment comprises:
    - computing a spectral flatness measure for said ith audio segment; and
      
      comparing said spectral flatness measure to a bound.
  - 5. The method according to claim 1 wherein determining if one or more formants are present in said i^thaudio segment comprises:
    - determining if two formants are present in said i^thaudio segment.
  - 6. The method according to claim 5 wherein determining if two formants are present in said i^thaudio segment comprises:
    - searching a first frequency range for a first formant;
      
      if said first formant is found at a first frequency in said first frequency range;
      
      searching for a second formant in a second frequency range that is spaced from said first frequency by a predetermined frequency offset, and if said second formant is not located in said second frequency range, searching for said second formant in a third frequency range and if said second formant is found in said third frequency range, testing if a ratio of magnitude of said second formant relative to a magnitude in a defined neighborhood of said second formant is less than a predetermined value, and if said ratio is less than said predetermined value rejecting said second formant.
  - 7. The method according to claim 6 wherein searching said first frequency range for said first formant and searching said second and third frequency ranges for said second formant comprise:
    - searching for a spectral peak on a second spectral scale that is finer than said first spectral scale; and
      
      if said spectral peak is at a boundary of a first frequency band of said perceptual scale and a second frequency band of said perceptual scale, testing if said spectral peak has a highest magnitude among a first plurality frequency bands of said second spectral scale that are located in said first frequency band of said perceptual scale and a second plurality of frequency bands of said second spectral scale that are located in said second frequency band of said perceptual scale; and
      
      if said spectral peak is located in said first frequency band of said perceptual scale, not at a boundary of said first frequency band of said perceptual scale and said second frequency band of said perceptual scale, testing if said spectral peak is highest among said first plurality frequency bands of said second spectral scale that are located in said first frequency band.
  - 8. The method according to claim 1 wherein:
    - if one or more formants are determined to be present in said i^thaudio segment and said signal-to-noise ratio is less than said threshold, the method further comprises;
      
      decreasing said formant enhancement gain for a pair of frequency band on opposite sides of said frequency band that includes said one of said one or more formants.
  - 9. The method according to claim 1 wherein computing a summed signal-to-noise ratio across at least a portion of said perceptual scale comprises computing said summed signal-to-noise ratio across said perceptual scale.
  - 10. The method according to claim 1 further comprising clipping said summed signal-to-noise ratio to a predetermined range.
  - 11. The method according to claim 1 further comprising:
    - normalizing said set of overall gains to maintain an energy of said i^thaudio segment.
  - 12. The method according to claim 1 wherein smoothing said set of formant enhancement gains comprises smoothing said set of formant enhancement gains across said perceptual scale.
  - 13. The method according to claim 1 wherein smoothing said set of formant enhancement gains comprises temporally smoothing said set of formant enhancement gains.
  - 14. The method according to claim 13 wherein smoothing said set of formant enhancement gains comprises smoothing said set of formant enhancement gains across said perceptual scale.
  - 15. The method according to claim 1 further comprising filtering said noise spectrum with a filter that matches an average frequency response of a physical obstruction proximate a user'"'"'s ear.

16. An audio apparatus adapted for outputting speech with enhanced intelligibility in a noisy environment, the apparatus comprising:
- a speaker for outputting said speech;
  
  a microphone for inputting noise from said noisy environment;
  
  a source of audio that is received from a remote terminal and to be output into said noisy environment;
  
  a processor coupled to said source of audio, said speaker and said microphone, wherein said microprocessor is programmed to;
  
  for each i^thaudio segment of a plurality of audio segments from the source of audio;
  
  analyze ambient noise from said noisy environment to produce a noise spectrum on a perceptual scale that comprises a plurality of frequency bands, wherein said noise spectrum includes a plurality noise magnitudes including a noise magnitude in each of said plurality of frequency bands;
  
  analyze said i^thaudio segment to produce a speech spectrum on said perceptual scale, wherein said speech spectrum comprises a plurality of speech magnitudes including a speech magnitude in each of said plurality of frequency bands;
  
  for each of said plurality of frequency bands, compute a signal-to-noise ratio from said plurality of speech magnitudes used as signal magnitudes and said plurality of noise magnitudes;
  
  determine if one or more formants are present in said i^thaudio segment;
  
  if one or more formants are determined to be present in said i^thaudio segment;
  
  compare the signal-to-noise ratio in each frequency band that includes one of the one or more formants to a threshold, and if said signal-to-noise ratio is less than said threshold increasing a formant enhancement gain for said frequency band that includes said one of the one or more formants;
  
  compute a summed signal-to-noise ratio across at least a portion of said perceptual scale wherein said plurality of speech magnitudes are used as signal magnitudes;
  
  scale a set of overall gains that include at least said formant enhancement gains as a function of said summed signal-to-noise ratio;
  
  smooth said set of overall gains;
  
  filter said i^thaudio segment with said overall gains; and
  
  output said i^thaudio segment into said noisy environment.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 17. The audio apparatus according to claim 16 wherein said processor is further programmed to:
    - read a plurality of high pass filter gains that include a high pass filter gain for each of said plurality of frequency bands of said perceptual scale;
      
      for each frequency band that includes one of the one or more formants, combine said formant enhancement gain with said high pass filter gain for said frequency band to produce a combined gain for said frequency band, whereby, a set of overall gains including a gain for each of said frequency bands of said perceptual scale is obtained.
  - 18. The audio apparatus according to claim 16 wherein, in analyzing ambient noise in said noisy environment to produce a noise spectrum on a perceptual scale said processor is programmed to:
    - analyze ambient noise in said noisy environment to produce a noise spectrum on the Bark scale.
  - 19. The apparatus to claim 16 wherein, in determining if one or more formants are present in said i^thaudio segment said processor is programmed to:
    - compute a spectral flatness measure for said ith audio segment; and
      
      compare said spectral flatness measure to a bound.
  - 20. The apparatus according to claim 16 wherein, in determining if one or more formants are present in said i^thaudio segment said processor is programmed to:
    - determine if two formants are present in said i^thaudio segment.
  - 21. The apparatus according to claim 16 wherein in determining if two formants are present in said i^thaudio segment said processor is programmed to:
    - search a first frequency range for a first formant;
      
      if said first formant is found at a first frequency in said first frequency range;
      
      search for a second formant in a second frequency range that is spaced from said first frequency by a predetermined frequency offset, and if said second formant is not located in said second frequency range, searching for said second formant in a third frequency range and if said second formant is found in said third frequency range, testing if a ratio of magnitude of said second formant relative to a magnitude in a defined neighborhood of said second formant is less than a predetermined value, and if said ratio is less than said predetermined value rejecting said second formant.
  - 22. The apparatus according to claim 21 wherein, in searching said first frequency range for said first formant and searching said second and third frequency ranges for said second formant said processor is programmed to:
    - search for a spectral peak on a second spectral scale that is finer than said first spectral scale; and
      
      if said spectral peak is at a boundary of a first frequency band of said perceptual scale and a second frequency band of said perceptual scale, test if said spectral peak has a highest magnitude among a first plurality frequency bands of said second spectral scale that are located in said first frequency band of said perceptual scale and a second plurality of frequency bands of said second spectral scale that are located in said second frequency band of said perceptual scale; and
      
      if said spectral peak is located in said first frequency band of said perceptual scale, not at a boundary of said first frequency band of said perceptual scale and said second frequency band of said perceptual scale, test if said spectral peak is highest among said first plurality frequency bands of said second spectral scale that are located in said first frequency band.
  - 23. The apparatus according to claim 16 wherein:
    - if one or more formants are determined to be present in said i^thaudio segment and said signal-to-noise ratio is less than said threshold, said processor is further programmed to;
      
      decrease said formant enhancement gain for a pair of frequency band on opposite sides of said frequency band that includes said one of said one or more formants.
  - 24. The apparatus according to claim 16 wherein, in computing a summed signal-to-noise ratio across at least a portion of said perceptual scale said processor is programmed to compute said summed signal-to-noise ratio across said perceptual scale.
  - 25. The apparatus according to claim 16 wherein said processor is further programmed to clip said summed signal-to-noise ratio to a predetermined range.
  - 26. The apparatus according to claim 16 further wherein said processor is further programmed to:
    - normalize said set of overall gains to maintain an energy of said i^thaudio segment.
  - 27. The apparatus according to claim 16 wherein, in smoothing said set of overall gains said processor is programmed to smooth said set of formant enhancement gains across said perceptual scale.
  - 28. The apparatus according to claim 16 wherein in smoothing said set of overall gain said processor is programmed to temporally smooth said set of formant enhancement gains.
  - 29. The apparatus according to claim 28 wherein in smoothing said set of overall gains said processor is programmed to smooth said set of formant enhancement gains across set perceptual scale.
  - 30. The apparatus according to claim 16 further wherein said processor is further programmed to filter said noise spectrum with a filter that matches an average frequency response of a physical obstruction proximate a user'"'"'s ear.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola Mobility LLC (Lenovo Group Ltd.)
Inventors
Song, Jianming J., Johnson, John C.
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US11/137,182
Publication Number

US 20060270467A1
Time in Patent Office

2,687 Days
Field of Search

704/226, 704/231, 704/200.1, 704/201, 704/203, 704/207, 704/208, 704/209, 704/224, 704/248, 704/251, 704/254, 381/106, 381/94.2, 379/29.03, 379/406.01, 379/406.08
US Class Current

704/225
CPC Class Codes

G10L 21/0208   Noise filtering

G10L 21/0232   Processing in the frequency...

G10L 25/15   the extracted parameters be...

H03G 3/3089   Control of digital or coded...

H04M 1/6025   implemented as integrated s...

Method and apparatus of increasing speech intelligibility in noisy environments

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

77 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus of increasing speech intelligibility in noisy environments

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

77 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links