Adaptive voice intelligibility processor
First Claim
1. A method of adjusting a voice intelligibility enhancement, the method comprising:
- receiving an input voice signal;
obtaining a spectral representation of the input voice signal with a linear predictive coding (LPC) process, the spectral representation comprising one or more formant frequencies;
adjusting the spectral representation of the input voice signal with one or more processors to produce an enhancement filter configured to emphasize the one or more formant frequencies, wherein the adjusting comprises decreasing a distance between line spectral pairs of at least one formant frequency obtained from the LPC process and thereby increasing a gain of a spectral peak associated with the at least one formant frequency;
applying an inverse filter to the input voice signal to obtain an excitation signal;
applying the enhancement filter to the excitation signal to produce a first modified voice signal with enhanced formant frequencies;
applying the enhancement filter to the input voice signal to produce a second modified voice signal;
combining at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce a combined modified voice signal;
detecting an envelope based on the input voice signal;
analyzing the detected envelope to determine one or more temporal enhancement parameters;
applying the one or more temporal enhancement parameters to the combined modified voice signal to emphasize peaks in one or more time domain envelopes of the combined modified voice signal by increasing a slope of the peaks to produce an output voice signal with emphasized consonant sounds; and
output the output voice signal for playback;
wherein at least said applying the one or more temporal enhancement parameters is performed by one or more processors.
6 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for adaptively processing speech to improve voice intelligibility are described. These systems and methods can adaptively identify and track formant locations, thereby enabling formants to be emphasized as they change. As a result, these systems and methods can improve near-end intelligibility, even in noisy environments. The systems and methods can be implemented in Voice-over IP (VoIP) applications, telephone and/or video conference applications (including on cellular phones, smart phones, and the like), laptop and tablet communications, and the like. The systems and methods can also enhance non-voiced speech, which can include speech generated without the vocal track, such as transient speech.
118 Citations
21 Claims
-
1. A method of adjusting a voice intelligibility enhancement, the method comprising:
-
receiving an input voice signal; obtaining a spectral representation of the input voice signal with a linear predictive coding (LPC) process, the spectral representation comprising one or more formant frequencies; adjusting the spectral representation of the input voice signal with one or more processors to produce an enhancement filter configured to emphasize the one or more formant frequencies, wherein the adjusting comprises decreasing a distance between line spectral pairs of at least one formant frequency obtained from the LPC process and thereby increasing a gain of a spectral peak associated with the at least one formant frequency; applying an inverse filter to the input voice signal to obtain an excitation signal; applying the enhancement filter to the excitation signal to produce a first modified voice signal with enhanced formant frequencies; applying the enhancement filter to the input voice signal to produce a second modified voice signal; combining at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce a combined modified voice signal; detecting an envelope based on the input voice signal; analyzing the detected envelope to determine one or more temporal enhancement parameters; applying the one or more temporal enhancement parameters to the combined modified voice signal to emphasize peaks in one or more time domain envelopes of the combined modified voice signal by increasing a slope of the peaks to produce an output voice signal with emphasized consonant sounds; and output the output voice signal for playback; wherein at least said applying the one or more temporal enhancement parameters is performed by one or more processors. - View Dependent Claims (2, 20)
-
-
3. A system for adjusting a voice intelligibility enhancement, the system comprising:
-
an analysis module configured to obtain a spectral representation of at least a portion of an input audio signal, the spectral representation comprising one or more formant frequencies; an inverse filter configured to be applied to the input audio signal to obtain an excitation signal; a formant enhancement module configured to generate an enhancement filter configured to emphasize the one or more formant frequencies, wherein the enhancement filter is configured to decrease a distance between line spectral pairs of at least one formant frequency and thereby increase a gain of a spectral peak associated with the at least one formant frequency; the enhancement filter configured to be applied to the excitation signal with one or more processors to produce a first modified voice signal, the enhancement filter further configured to be applied to the input audio signal with the one or more processors to produce a second modified voice signal; a combiner configured to combine at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce a combined modified voice signal; a temporal enveloper shaper configured to apply a temporal enhancement to one or more time domain envelopes of the combined modified voice signal with the one or more processors to produce an output signal, the temporal enhancement configured to emphasize peaks in the one or more time domain envelopes by increasing a slope of the peaks to thereby emphasize one or more consonant sounds in the combined modified voice signal; and an output module configured to output the output signal for playback. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 18, 19, 21)
-
-
13. A system for adjusting a voice intelligibility enhancement, the system comprising:
-
a linear predictive coding analysis module configured to apply a linear predictive coding (LPC) technique to obtain LPC coefficients that correspond to a spectrum of an input voice signal, the spectrum comprising one or more formant frequencies; a mapping module configured to map the LPC coefficients to line spectral pairs; a formant enhancement module configured to modify the line spectral pairs with one or more processors by at least applying a modulation factor to the line spectral pairs to decrease a distance between the line spectral pairs and thereby produce an enhancement filter configured to emphasize the formant frequency; an inverse filter configured to be applied to the input audio signal to obtain an excitation signal; the enhancement filter configured to be applied to the excitation signal to produce a first modified voice signal, the enhancement filter further configured to be applied to the input voice signal to produce a second modified voice signal; a combiner configured to combine at least a portion of the first modified voice signal with at least a portion of the second modified voice signal to produce a combined modified voice signal; and an output module configured to output an audio signal based on the combined modified voice signal for playback. - View Dependent Claims (14, 15, 16, 17)
-
Specification