Speech segment coding and pitch control methods for speech synthesis systems
First Claim
1. A speech coding method for use in speech synthesis, comprising:
- obtaining a set of spectral envelope parameters that represents an estimated spectral envelope of a voiced speech signal by using a spectrum estimation technique;
deconvolving said voiced speech signal, with an impulse response that is a time-domain representation of said estimated spectral envelope of said voiced speech signal, into a pitch pulse train signal having a sequence of periodically located pitch pulses;
forming an excitation signal by appending zero-valued samples to each pitch pulse signal of one period such that one pitch pulse is contained in each period;
convolving said excitation signal with said impulse response into wavelets;
obtaining wavelet codes by coding the wavelets of all periods; and
storing in memory wavelet codes and information of corresponding pitch pulse locations of all wavelets, for use in speech synthesis.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a method and system for synthesizing speech utilizing a periodic waveform decomposition and relocation coding scheme. According to the scheme, signals of voiced sound interval among original speech are decomposed into wavelets, each of which corresponds to a speech waveform for one period made by each glottal pulse. These wavelets are respectively coded and stored. The wavelets nearest to the positions where the wavelets are to be located are selected from stored wavelets and decoded. The decoded wavelets are superposed to each other such that original sound quality can be maintained and duration and pitch frequency of speech segment can be controlled arbitrarily.
324 Citations
8 Claims
-
1. A speech coding method for use in speech synthesis, comprising:
-
obtaining a set of spectral envelope parameters that represents an estimated spectral envelope of a voiced speech signal by using a spectrum estimation technique; deconvolving said voiced speech signal, with an impulse response that is a time-domain representation of said estimated spectral envelope of said voiced speech signal, into a pitch pulse train signal having a sequence of periodically located pitch pulses; forming an excitation signal by appending zero-valued samples to each pitch pulse signal of one period such that one pitch pulse is contained in each period; convolving said excitation signal with said impulse response into wavelets; obtaining wavelet codes by coding the wavelets of all periods; and storing in memory wavelet codes and information of corresponding pitch pulse locations of all wavelets, for use in speech synthesis. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A speech coding method for use in speech synthesis, comprising:
-
obtaining a set of spectral envelope parameters of a voice speech signal by spectrum estimation; deconvolving the voice speech signal, with an impulse response that is representative of the spectral envelope parameters set of the voice speech signal, into a pitch pulse train signal having a plurality of pitch pulses; forming an excitation signal by segmenting the pitch pulse train signal such that one pitch pulse is contained in each period; convolving the excitation signal with the impulse response into a plurality of wavelets; and storing the plurality of wavelets for use in speech synthesis. - View Dependent Claims (7)
-
-
8. A speech coding method for use in speech synthesis, comprising:
-
obtaining a set of spectral envelope parameters of a voice speech signal by spectrum estimation; deconvolving the voice speech signal, with an impulse response that is representative of the set of spectral envelope parameters, into a pitch pulse train signal having a substantially flat spectral envelope and a sequence of periodically located pitch pulses; forming an excitation signal by adding zero-valued samples to each pitch pulse train signal of one period such that one pitch pulse is contained in each period; convolving the excitation signal with the impulse response into wavelets with each wavelet being associated with one pitch pulse; and storing the wavelets and the locations of the associated pitch pulses in memory for use in speech synthesis.
-
Specification