Detecting transients to emphasize formant peaks
First Claim
1. A speech signal processing method for decoding a speech signal encoded by a speech encoding method in which a speed signal is represented by parameters in at least a frequency domain, comprising the steps of:
- smoothing on the frequency axis a signal representing an intensity of the frequency spectrum;
comparing a signal representing an intensity of the frequency spectrum with the smoothed version of the signal obtained in the smoothing step;
taking the difference between the signal representing the intensity of the spectrum and the version of said signal obtained on smoothing on the frequency axis;
performing a processing of deepening valley portions between formants of a transmitted frequency spectrum using the results of the comparing step;
wherein said step of processing of deepening the valley portions between the formants of the frequency spectrum is performed using the result of the step of taking the difference between the signal representing the intensity of the spectrum and the version of said signal obtained on smoothing on the frequency axis.
0 Assignments
0 Petitions
Accused Products
Abstract
Nasalized sound effects during reproduction of low-pitch sounds are suppressed to produce playback sounds of high clarity. Amplitude data is processed with high range formant emphasis of crests and valleys of the envelope of the frequency spectrum on the high frequency range and with deepening of the valley of the frequency spectrum over the entire frequency range, above all, over the low to mid frequency range. Next, the amplitude data is processed for emphasizing the peak values of the formant of the voiced frame in the portion of the speech signal which is rising in magnitude and for unconditionally emphasizing the spectral envelope on the high frequency range. The voiced speech spectrum is generated by synthesizing the cosine wave based upon the emphasized amplitude data.
51 Citations
7 Claims
-
1. A speech signal processing method for decoding a speech signal encoded by a speech encoding method in which a speed signal is represented by parameters in at least a frequency domain, comprising the steps of:
-
smoothing on the frequency axis a signal representing an intensity of the frequency spectrum; comparing a signal representing an intensity of the frequency spectrum with the smoothed version of the signal obtained in the smoothing step; taking the difference between the signal representing the intensity of the spectrum and the version of said signal obtained on smoothing on the frequency axis; performing a processing of deepening valley portions between formants of a transmitted frequency spectrum using the results of the comparing step; wherein said step of processing of deepening the valley portions between the formants of the frequency spectrum is performed using the result of the step of taking the difference between the signal representing the intensity of the spectrum and the version of said signal obtained on smoothing on the frequency axis. - View Dependent Claims (2, 3)
-
-
4. A speech signal processing method employed in a speech synthesis system centered about processing in the frequency domain, comprising the steps of:
-
dividing the speech signal into a plurality of frames; calculating an energy of the speech signal for each of the frames sequentially; comparing the calculated energy of the current frame with the calculated energy of the previous frame in order to detect a transient portion where speech energy rapidly increases in the time domain; and emphasizing formant peaks of the frequency spectrum in the detected transient portion by directly acting on frequency domain parameters when the transient portion is detected in the comparing step. - View Dependent Claims (5, 6)
-
-
7. A speech signal processing method for decoding a speech signal encoded by a speech encoding method in which a speech signal is represented by parameters in at least a frequency domain, comprising the steps of:
-
smoothing on the frequency axis a signal representing an intensity of the frequency spectrum; comparing a signal representing an intensity of the frequency spectrum and the smoothed version of the signal obtained in the smoothing step; and performing a processing of deepening valley portion between format of a transmitted frequency spectrum using the result of the comparing steps, wherein said smoothing step is carried out by taking moving averages obtained by averaging spectrum intensity values in predetermined frequency windows successively defined in frequency domain.
-
Specification