Adaptive voice intelligibility processor

US 9,117,455 B2
Filed: 07/26/2012
Issued: 08/25/2015
Est. Priority Date: 07/29/2011
Status: Active Grant

First Claim

Patent Images

1. A method of adjusting a voice intelligibility enhancement, the method comprising:

receiving an input voice signal;

obtaining a spectral representation of the input voice signal with a linear predictive coding (LPC) process, the spectral representation comprising one or more formant frequencies;

adjusting the spectral representation of the input voice signal with one or more processors to produce an enhancement filter configured to emphasize the one or more formant frequencies, wherein the adjusting comprises decreasing a distance between line spectral pairs of at least one formant frequency obtained from the LPC process and thereby increasing a gain of a spectral peak associated with the at least one formant frequency;

applying an inverse filter to the input voice signal to obtain an excitation signal;

applying the enhancement filter to the excitation signal to produce a first modified voice signal with enhanced formant frequencies;

applying the enhancement filter to the input voice signal to produce a second modified voice signal;

combining at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce a combined modified voice signal;

detecting an envelope based on the input voice signal;

analyzing the detected envelope to determine one or more temporal enhancement parameters;

applying the one or more temporal enhancement parameters to the combined modified voice signal to emphasize peaks in one or more time domain envelopes of the combined modified voice signal by increasing a slope of the peaks to produce an output voice signal with emphasized consonant sounds; and

output the output voice signal for playback;

wherein at least said applying the one or more temporal enhancement parameters is performed by one or more processors.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for adaptively processing speech to improve voice intelligibility are described. These systems and methods can adaptively identify and track formant locations, thereby enabling formants to be emphasized as they change. As a result, these systems and methods can improve near-end intelligibility, even in noisy environments. The systems and methods can be implemented in Voice-over IP (VoIP) applications, telephone and/or video conference applications (including on cellular phones, smart phones, and the like), laptop and tablet communications, and the like. The systems and methods can also enhance non-voiced speech, which can include speech generated without the vocal track, such as transient speech.

118 Citations

21 Claims

1. A method of adjusting a voice intelligibility enhancement, the method comprising:
- receiving an input voice signal;
  
  obtaining a spectral representation of the input voice signal with a linear predictive coding (LPC) process, the spectral representation comprising one or more formant frequencies;
  
  adjusting the spectral representation of the input voice signal with one or more processors to produce an enhancement filter configured to emphasize the one or more formant frequencies, wherein the adjusting comprises decreasing a distance between line spectral pairs of at least one formant frequency obtained from the LPC process and thereby increasing a gain of a spectral peak associated with the at least one formant frequency;
  
  applying an inverse filter to the input voice signal to obtain an excitation signal;
  
  applying the enhancement filter to the excitation signal to produce a first modified voice signal with enhanced formant frequencies;
  
  applying the enhancement filter to the input voice signal to produce a second modified voice signal;
  
  combining at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce a combined modified voice signal;
  
  detecting an envelope based on the input voice signal;
  
  analyzing the detected envelope to determine one or more temporal enhancement parameters;
  
  applying the one or more temporal enhancement parameters to the combined modified voice signal to emphasize peaks in one or more time domain envelopes of the combined modified voice signal by increasing a slope of the peaks to produce an output voice signal with emphasized consonant sounds; and
  
  output the output voice signal for playback;
  
  wherein at least said applying the one or more temporal enhancement parameters is performed by one or more processors.
- View Dependent Claims (2, 20)
- - 2. The method of claim 1, wherein said detecting the envelope comprises detecting an envelope of one or more of the following:
    - the input voice signal and the combined modified voice signal.
  - 20. The method of claim 1, wherein the combining comprises adding at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce the combined modified voice signal.

3. A system for adjusting a voice intelligibility enhancement, the system comprising:
- an analysis module configured to obtain a spectral representation of at least a portion of an input audio signal, the spectral representation comprising one or more formant frequencies;
  
  an inverse filter configured to be applied to the input audio signal to obtain an excitation signal;
  
  a formant enhancement module configured to generate an enhancement filter configured to emphasize the one or more formant frequencies, wherein the enhancement filter is configured to decrease a distance between line spectral pairs of at least one formant frequency and thereby increase a gain of a spectral peak associated with the at least one formant frequency;
  
  the enhancement filter configured to be applied to the excitation signal with one or more processors to produce a first modified voice signal, the enhancement filter further configured to be applied to the input audio signal with the one or more processors to produce a second modified voice signal;
  
  a combiner configured to combine at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce a combined modified voice signal;
  
  a temporal enveloper shaper configured to apply a temporal enhancement to one or more time domain envelopes of the combined modified voice signal with the one or more processors to produce an output signal, the temporal enhancement configured to emphasize peaks in the one or more time domain envelopes by increasing a slope of the peaks to thereby emphasize one or more consonant sounds in the combined modified voice signal; and
  
  an output module configured to output the output signal for playback.
- View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 18, 19, 21)
- - 4. The system of claim 3, wherein the analysis module is further configured to obtain the spectral representation of the input audio signal using a linear predictive coding technique configured to generate coefficients that correspond to the spectral representation.
  - 5. The system of claim 4, further comprising a mapping module configured to map the coefficients to line spectral pairs.
  - 6. The system of claim 5, further comprising modifying the line spectral pairs using a modulation factor to increase gain in the spectral representation corresponding to the formant frequencies.
  - 7. The system of claim 3, wherein the enhancement filter is further configured to be applied to one or more of the following:
    - the input audio signal and the excitation signal derived from the input audio signal.
  - 8. The system of claim 3, wherein the temporal envelope shaper is further configured to subdivide the combined modified voice signal into a plurality of bands, and wherein the one or more envelopes correspond to an envelope for at least some of the plurality of bands.
  - 9. The system of claim 3, further comprising a voice enhancement controller configured to adjust a gain of the enhancement filter based at least partly on an amount of detected environmental noise in an input microphone signal.
  - 10. The system of claim 9, further comprising a voice activity detector configured to detect voice in the input microphone signal and to control the voice enhancement controller responsive to the detected voice.
  - 11. The system of claim 10, wherein the voice activity detector is further configured to cause the voice enhancement controller to adjust the gain of the enhancement filter based on a previous noise input responsive to detecting voice in the input microphone signal.
  - 12. The system of claim 9, further comprising a microphone calibration module configured to set a gain of a microphone configured to receive the input microphone signal, wherein the microphone calibration module is further configured to set the gain based at least in part on a reference signal and a recorded noise signal.
  - 18. The system of claim 3, wherein the combiner is configured to add at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce the combined modified voice signal.
  - 19. The system of claim 18, further comprising a gain module configured to adjust, based at least partly on an amount of detected environmental noise, a gain of one or more of the first modified voice signal and the second modified voice signal.
  - 21. The system of claim 18, wherein the combiner is configured to add at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce the combined modified voice signal.

13. A system for adjusting a voice intelligibility enhancement, the system comprising:
- a linear predictive coding analysis module configured to apply a linear predictive coding (LPC) technique to obtain LPC coefficients that correspond to a spectrum of an input voice signal, the spectrum comprising one or more formant frequencies;
  
  a mapping module configured to map the LPC coefficients to line spectral pairs;
  
  a formant enhancement module configured to modify the line spectral pairs with one or more processors by at least applying a modulation factor to the line spectral pairs to decrease a distance between the line spectral pairs and thereby produce an enhancement filter configured to emphasize the formant frequency;
  
  an inverse filter configured to be applied to the input audio signal to obtain an excitation signal;
  
  the enhancement filter configured to be applied to the excitation signal to produce a first modified voice signal, the enhancement filter further configured to be applied to the input voice signal to produce a second modified voice signal;
  
  a combiner configured to combine at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce a combined modified voice signal; and
  
  an output module configured to output an audio signal based on the combined modified voice signal for playback.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The system of claim 13, further comprising a voice activity detector configured to detect voice in an input microphone signal and to cause a gain of the enhancement filter to be adjusted responsive to detecting voice in the input microphone signal.
  - 15. The system of claim 14, further comprising a microphone calibration module configured to set a gain of a microphone configured to receive the input microphone signal, wherein the microphone calibration module is further configured to set the gain based at least in part on a reference signal and a recorded noise signal.
  - 16. The system of claim 13, wherein the enhancement filter is further configured to be applied to one or more of the following:
    - the input voice signal and the excitation signal derived from the input voice signal.
  - 17. The system of claim 13, further comprising a temporal enveloper shaper configured to apply a temporal enhancement to the combined modified voice signal at least by increasing a slope of a temporal envelope in the combined modified voice signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
DTS, Inc. (Adeia Inc.)
Original Assignee
DTS, Inc. (Adeia Inc.)
Inventors
Tracey, James, Noh, Daekyong, He, Xing
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Thomas-Homescu, Anne

Application Number

US13/559,450
Publication Number

US 20130030800A1
Time in Patent Office

1,125 Days
Field of Search

704/219, 704/207, 704/223, 704/201, 704/200, 704/226, 704/206, 704/225, 704/214, 704/233, 381/57, 381/320, 381/94.3
US Class Current

1/1
CPC Class Codes

G10L 19/07   Line spectrum pair [LSP] vo...

G10L 21/003   Changing voice quality, e.g...

G10L 21/0316   by changing the amplitude

G10L 21/0364   for improving intelligibility

G10L 25/15   the extracted parameters be...

Adaptive voice intelligibility processor

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

118 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptive voice intelligibility processor

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

118 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links