Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
First Claim
1. A method performed by a processing circuit for identification of sub-band Eigen pulses from a glottal pulse database for training a speech synthesis system, wherein the method comprises:
- a. receiving pulses from the glottal pulse database;
b. decomposing each pulse into a plurality of sub-band components;
c. distributing the plurality of sub-band components into a plurality of databases based on a frequency level of sub-band component of the plurality of sub-band components, wherein each database of the plurality of databases corresponds to a frequency level of a sub-band component of the plurality of sub-band components;
d. determining a vector representation of each database wherein the determining a vector representation of each database further comprises a set of distances from a set of fixed number of points of a metric space, obtained as centroids after a metric based clustering of a large set of signals from the metric space;
e. determining Eigen pulse values, from the vector representation, for each database;
f. selecting a best Eigen pulse for each database for use in synthesis; and
g. applying the selected Eigen pulse from the speech signal to form an excitation signal, wherein the excitation signal is applied in the speech synthesis system to synthesize speech.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method are presented for forming the excitation signal for a glottal pulse model based parametric speech synthesis system. The excitation signal may be formed by using a plurality of sub-band templates instead of a single one. The plurality of sub-band templates may be combined to form the excitation signal wherein the proportion in which the templates are added is dynamically based on determined energy coefficients. These coefficients vary from frame to frame and are learned, along with the spectral parameters, during feature training. The coefficients are appended to the feature vector, which comprises spectral parameters and is modeled using HMMs, and the excitation signal is determined.
31 Citations
6 Claims
-
1. A method performed by a processing circuit for identification of sub-band Eigen pulses from a glottal pulse database for training a speech synthesis system, wherein the method comprises:
-
a. receiving pulses from the glottal pulse database; b. decomposing each pulse into a plurality of sub-band components; c. distributing the plurality of sub-band components into a plurality of databases based on a frequency level of sub-band component of the plurality of sub-band components, wherein each database of the plurality of databases corresponds to a frequency level of a sub-band component of the plurality of sub-band components; d. determining a vector representation of each database wherein the determining a vector representation of each database further comprises a set of distances from a set of fixed number of points of a metric space, obtained as centroids after a metric based clustering of a large set of signals from the metric space; e. determining Eigen pulse values, from the vector representation, for each database; f. selecting a best Eigen pulse for each database for use in synthesis; and g. applying the selected Eigen pulse from the speech signal to form an excitation signal, wherein the excitation signal is applied in the speech synthesis system to synthesize speech. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification