Method and apparatus for increasing speech intelligibility in noisy environments

US 8,364,477 B2
Filed: 08/30/2012
Issued: 01/29/2013
Est. Priority Date: 05/25/2005
Status: Active Grant

First Claim

Patent Images

1. A method of improving intelligibility of speech that is included in audio that is emitted into a noisy environment, the method comprising:

determining if one or more voice formants are present in each i^thaudio segment of a plurality of audio segments;

if one or more formants are determined to be present in the i^thaudio segment;

selecting a perceptual frequency scale band (L) including at least one of the one or more formants from a plurality of perceptual frequency scale bands of a perceptual scale ambient noise spectrum of the noisy environment;

comparing, to a threshold, a signal-to-noise ratio of the perceptual frequency scale band, andif the signal-to-noise ratio is less than the threshold, increasing a formant enhancement gain for the perceptual frequency scale band;

computing a summed signal-to-noise ratio across at least a portion of the perceptual scale ambient noise spectrum wherein a plurality of speech magnitudes in each of the plurality of perceptual frequency scale bands are used as signal magnitudes;

scaling a set of overall gains that include at least the formant enhancement gains as a function of the summed signal-to-noise ratio;

smoothing the set of overall gains;

filtering the i^thaudio segment with the set of overall gains; and

outputting the i^thaudio segment into the noisy environment.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method (400, 500) and apparatus (220) seeks to improve the intelligibility of speech emitted into a noisy environment. Formants are identified (426) and perceptual frequency scale band is selected (502) that includes at least one of the identified formants. The SNR in each band is compared (504) to a threshold and, if the SNR for that band is less than the threshold, the method increases a formant enhancement gain for that band. A set of high pass filter gains (338) is combined (516) with the formant enhancement gains yielding combined gains that are then clipped (518), scaled (520) according to a total SNR, normalized (526), smoothed across time (530) and frequency (532), and used to reconstruct (532, 534) an audio signal.

Citations

22 Claims

1. A method of improving intelligibility of speech that is included in audio that is emitted into a noisy environment, the method comprising:
- determining if one or more voice formants are present in each i^thaudio segment of a plurality of audio segments;
  
  if one or more formants are determined to be present in the i^thaudio segment;
  
  selecting a perceptual frequency scale band (L) including at least one of the one or more formants from a plurality of perceptual frequency scale bands of a perceptual scale ambient noise spectrum of the noisy environment;
  
  comparing, to a threshold, a signal-to-noise ratio of the perceptual frequency scale band, andif the signal-to-noise ratio is less than the threshold, increasing a formant enhancement gain for the perceptual frequency scale band;
  
  computing a summed signal-to-noise ratio across at least a portion of the perceptual scale ambient noise spectrum wherein a plurality of speech magnitudes in each of the plurality of perceptual frequency scale bands are used as signal magnitudes;
  
  scaling a set of overall gains that include at least the formant enhancement gains as a function of the summed signal-to-noise ratio;
  
  smoothing the set of overall gains;
  
  filtering the i^thaudio segment with the set of overall gains; and
  
  outputting the i^thaudio segment into the noisy environment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method according to claim 1 further comprising:
    - reading a plurality of high pass filter gains that include a high pass filter gain for each of the plurality of perceptual frequency scale bands;
      
      for each perceptual frequency scale band that includes at least one of the one or more formants, combining the formant enhancement gain with the high pass filter gain for the perceptual frequency scale band to produce a combined gain for the perceptual frequency scale band to obtain the set of overall gains for the plurality of perceptual frequency scale bands.
  - 3. The method according to claim 1 further comprising:
    - creating the perceptual scale ambient noise spectrum of the noisy environment by analyzing ambient noise in the noisy environment to produce a noise spectrum on the Bark scale.
  - 4. The method according to claim 1 wherein determining if one or more voice formants are present in each i^thaudio segment of a plurality of audio segments comprises:
    - computing a spectral flatness measure for the i^thaudio segment; and
      
      comparing the spectral flatness measure to a bound.
  - 5. The method according to claim 1 wherein determining if one or more formants are present in each i^thaudio segment of a plurality of audio segments comprises:
    - determining if two voice formants are present in each i^thaudio segment.
  - 6. The method according to claim 5 wherein determining if two voice formants are present in each i^thaudio segment comprises:
    - searching a first frequency range for a first voice formant;
      
      if the first voice formant is found at a first frequency in the first frequency range;
      
      searching for a second voice formant in a second frequency range that is spaced from the first frequency by a predetermined frequency offset, and if the second voice formant is not located in the second frequency range, searching for the second voice formant in a third frequency range, and if the second voice formant is found in the third frequency range, testing if a ratio of magnitude of the second voice formant relative to a magnitude in a defined neighborhood of the second voice formant is less than a predetermined value, and if the ratio is less than the predetermined value, rejecting the second voice formant.
  - 7. The method according to claim 1 wherein the threshold is related to the voice formant.
  - 8. The method according to claim 1 wherein if one or more voice formants are determined to be present in the i^thaudio segment and if the signal-to-noise ratio is less than the threshold, the method further comprises:
    - decreasing a formant enhancement gain for a pair of perceptual frequency scale bands on opposite sides of the perceptual frequency scale band (L).
  - 9. The method according to claim 1 wherein computing a summed signal-to-noise ratio across at least a portion of the perceptual scale ambient noise spectrum comprises:
    - computing the summed signal-to-noise ratio across the perceptual scale ambient noise spectrum.
  - 10. The method according to claim 1 further comprising:
    - clipping the summed signal-to-noise ratio to a predetermined range.
  - 11. The method according to claim 1 further comprising:
    - normalizing the set of overall gains to maintain an energy of the i^thaudio segment.
  - 12. The method according to claim 1 wherein smoothing the set of overall gains comprises:
    - smoothing the set of overall gains across the perceptual scale ambient noise spectrum.
  - 13. The method according to claim 1 wherein smoothing the set of overall gains comprises:
    - temporally smoothing the set of formant enhancement gains.
  - 14. The method according to claim 13 wherein smoothing the set of overall gains comprises:
    - smoothing the set of formant enhancement gains across the perceptual scale ambient noise spectrum.
  - 15. The method according to claim 1 further comprising:
    - filtering a noise spectrum with a filter that matches an average frequency response of a physical obstruction proximate a user'"'"'s ear.

16. An audio apparatus adapted for outputting speech in a noisy environment, the audio apparatus comprising:
- a speaker for outputting the speech;
  
  a microphone for receiving ambient noise from the noisy environment;
  
  a source of audio to be output into the noisy environment;
  
  a processor coupled to the source of audio, the speaker, and the microphone, wherein the processor is programmed to;
  
  determine if one or more voice formants are present in each i^thaudio segment of a plurality of audio segments;
  
  if one or more formants are determined to be present in the i^thaudio segment;
  
  select a perceptual frequency scale band (L) including at least one of the one or more formants from a plurality of perceptual frequency scale bands of a perceptual scale ambient noise spectrum of the noisy environment;
  
  compare, to a threshold, a signal-to-noise ratio of the perceptual frequency scale band, andif the signal-to-noise ratio is less than the threshold, increase a formant enhancement gain for the perceptual frequency scale band;
  
  compute a summed signal-to-noise ratio across at least a portion of the perceptual scale ambient noise spectrum wherein a plurality of speech magnitudes are used as signal magnitudes;
  
  scale a set of overall gains that include at least the formant enhancement gains as a function of the summed signal-to-noise ratio;
  
  smooth the set of overall gains;
  
  filter the i^thaudio segment with the set of overall gains; and
  
  output the i^thaudio segment into the noisy environment.
- View Dependent Claims (17, 18, 19, 20, 21, 22)
- - 17. The audio apparatus according to claim 16 wherein the processor is further programmed to:
    - read a plurality of high pass filter gains that include a high pass filter gain for each of the plurality of perceptual frequency scale bands;
      
      for each perceptual frequency scale band that includes at least one of the one or more formants, combine the formant enhancement gain with the high pass filter gain for the perceptual frequency scale band to produce a combined gain for the perceptual frequency scale band to obtain the set of overall gains for the plurality of perceptual frequency scale bands.
  - 18. The audio apparatus according to claim 16 wherein the processor is programmed to:
    - analyze the ambient noise in the noisy environment to produce a noise spectrum on the Bark scale to create the perceptual scale ambient noise spectrum of the noisy environment.
  - 19. The audio apparatus according to claim 16 wherein the processor is programmed to:
    - determine if two voice formants are present in each i^thaudio segment.
  - 20. The audio apparatus according to claim 19 wherein in determining if two formants are present in each i^thaudio segment, the processor is programmed to:
    - search a first frequency range for a first voice formant;
      
      if the first voice formant is found at a first frequency in the first frequency range;
      
      search for a second voice formant in a second frequency range that is spaced from the first frequency by a predetermined frequency offset, and if the second voice formant is not located in the second frequency range, searching for the second voice formant in a third frequency range, and if the second voice formant is found in the third frequency range, testing if a ratio of magnitude of the second voice formant relative to a magnitude in a defined neighborhood of the second voice formant is less than a predetermined value, and if the ratio is less than the predetermined value, rejecting the second voice formant.
  - 21. The audio apparatus according to claim 16 wherein the threshold is related to the voice formant.
  - 22. The audio apparatus according to claim 16 wherein if one or more voice formants are determined to be present in the i^thaudio segment and if the signal-to-noise ratio is less than the threshold, the processor is further programmed to:
    - decrease a formant enhancement gain for a pair of perceptual frequency scale bands on opposite sides of the perceptual frequency scale band (L).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola Mobility LLC (Lenovo Group Ltd.)
Inventors
Song, Jianming J, Johnson, John C
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US13/599,587
Publication Number

US 20120323571A1
Time in Patent Office

152 Days
Field of Search

704/226, 704/231, 704/200.1, 704/201, 704/203, 704/207, 704/208, 704/209, 704/224, 704/248, 704/251, 704/254, 381/106, 381/94.2, 379/29.03, 379/406.01, 379/406.08
US Class Current

704/225
CPC Class Codes

G10L 21/0208   Noise filtering

G10L 21/0232   Processing in the frequency...

G10L 25/15   the extracted parameters be...

H03G 3/3089   Control of digital or coded...

H04M 1/6025   implemented as integrated s...

Method and apparatus for increasing speech intelligibility in noisy environments

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for increasing speech intelligibility in noisy environments

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links