Rule based speech synthesis method and apparatus
First Claim
1. A rule based speech synthesis apparatus comprisingspeech element set storage means for storing a plurality of phoneme strings, each having a vowel phoneme on a boundary thereof, as a speech element, along with feature parameters, as a speech element set;
- speech element selection means for reading out acoustic feature parameters of a corresponding speech element from said speech element set storage means, based on an input phoneme string;
target parameter storage means having stored therein representative acoustic feature parameters from one vowel to another;
parameter correction means for reading out a target parameter comprising acoustic parameters from one vowel to another for a vowel from said target parameter storage means in response to the acoustic feature parameter of the speech element output from said speech element selection means and for correcting the acoustic feature parameter of said speech element based on said target parameters, the acoustic feature parameter being corrected according to at least one predetermined equation wherein the corrected acoustic feature parameter is a function of at least a first target value for a parameter at a leading edge of said speech element and a second target value for a parameter at a trailing edge of said speech element, the corrected acoustic feature having a value equal to said first target value at said leading edge of said speech element and a value equal to said second target value at said trailing edge of said speech element;
time-series data generating means for concatenating plural acoustic feature parameters output from said parameter correction means to generate time series data of the acoustic feature parameters; and
speech synthesizing means for uttering and outputting speech signals of the synthesized speech corresponding to the input phoneme strings in accordance with time-series data of the acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating means.
1 Assignment
0 Petitions
Accused Products
Abstract
A rule based speech synthesis apparatus by which concatenation distortion may be less than a preset value without dependency on utterance, wherein a parameter correction unit reads out a target parameter for a vowel from a target parameter storage, responsive to the phoneme at a leading end and at a trailing end of a speech element and acoustic feature parameters output from a speech element selector, and accordingly corrects the acoustic feature parameters of the speech element. The parameter correction unit corrects the parameters, so that the parameters ahead and behind the speech element are equal to the target parameter for the vowel of the corresponding phoneme, and outputs the corrected parameters.
11 Citations
13 Claims
-
1. A rule based speech synthesis apparatus comprising
speech element set storage means for storing a plurality of phoneme strings, each having a vowel phoneme on a boundary thereof, as a speech element, along with feature parameters, as a speech element set; -
speech element selection means for reading out acoustic feature parameters of a corresponding speech element from said speech element set storage means, based on an input phoneme string; target parameter storage means having stored therein representative acoustic feature parameters from one vowel to another; parameter correction means for reading out a target parameter comprising acoustic parameters from one vowel to another for a vowel from said target parameter storage means in response to the acoustic feature parameter of the speech element output from said speech element selection means and for correcting the acoustic feature parameter of said speech element based on said target parameters, the acoustic feature parameter being corrected according to at least one predetermined equation wherein the corrected acoustic feature parameter is a function of at least a first target value for a parameter at a leading edge of said speech element and a second target value for a parameter at a trailing edge of said speech element, the corrected acoustic feature having a value equal to said first target value at said leading edge of said speech element and a value equal to said second target value at said trailing edge of said speech element; time-series data generating means for concatenating plural acoustic feature parameters output from said parameter correction means to generate time series data of the acoustic feature parameters; and speech synthesizing means for uttering and outputting speech signals of the synthesized speech corresponding to the input phoneme strings in accordance with time-series data of the acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating means. - View Dependent Claims (2, 3, 4)
-
-
5. A rule based speech synthesis method of using a processor to perform steps comprising
a speech element selecting step of reading out an acoustic feature parameter corresponding to a speech element, based on input phoneme strings, from a speech element set storage storing a plurality of phoneme strings, each having a vowel phoneme on the boundary, as a speech element, along with feature parameters, as a speech element set; -
a parameter correction step of reading out a target parameter comprising acoustic parameters from one vowel to another for a vowel, in response to the acoustic feature parameters of the speech element output in said speech element selecting step from the target parameter storage having stored therein representative acoustic feature parameters from one vowel to another for correcting the acoustic feature parameters of said speech element based on said target parameter, the acoustic feature parameters being corrected according to at least one predetermined equation wherein the corrected acoustic feature parameters are a function of at least a first target value for a parameter at a leading edge of said speech element and a second target value for a parameter at a trailing edge of said speech element, the corrected acoustic feature having a value equal to said first target value at said leading edge of said speech element and a value equal to said second target value at said trailing edge of said speech element; a time series data generating step of generating time series data of the acoustic feature parameters by concatenating the acoustic feature parameters output from said parameter correction step; and a speech synthesis step of uttering and outputting a speech signal of the synthesized speech, corresponding to said input of phoneme strings, in accordance with the acoustic feature parameters, corresponding to said input phoneme strings, generated in said time series data generating step.
-
-
6. A rule based speech synthesis apparatus comprising
speech element set storage means for storing a plurality of phoneme strings, each having a vowel phoneme on a boundary thereof, as a speech element, along with feature parameters of each speech element, as a speech element set; -
speech element selection means for reading out acoustic feature parameters of a corresponding speech element from said speech element set storage means based on an input phoneme string; target parameter storage means having stored therein a plurality of acoustic feature parameters from one vowel to another; parameter correction means for selecting a specified acoustic feature parameter in response to an acoustic feature parameter of said speech element selection means, from target parameters comprising acoustic parameters from one vowel to another stored in said target parameter storage means and for correcting the acoustic feature parameter of the speech element responsive to the selected specified acoustic feature parameter, the acoustic feature parameter being corrected according to at least one predetermined equation wherein the corrected acoustic feature parameter is a function of at least a first target value for a parameter at a leading edge of said speech element and a second target value for a parameter at a trailing edge of said speech element, the corrected acoustic feature having a value equal to said first target value at said leading edge of said speech element and a value equal to said second target value at said trailing edge of said speech element; time-series data generating means for concatenating plural acoustic feature parameters output from said parameter correction means to generate time series data of the acoustic feature parameters; and speech synthesizing means for uttering and outputting speech signals of synthesized speech corresponding to the input phoneme strings, based on time-series data of the acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating means. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A rule based speech synthesis method of using a processor to perform steps comprising
a speech element set selecting step of reading out and outputting an acoustic feature parameter of a corresponding speech element, based on input phoneme strings, from a speech element set storage adapted for storing plural phoneme strings each having a vowel phoneme on the boundary, as a speech element, as a set of the speech element with the acoustic feature parameter; -
a parameter correcting step of selecting, from target parameters comprising acoustic parameters from one vowel to another stored in a target parameter storage, a specified acoustic feature parameter, responsive to the acoustic feature parameter of the speech element output from the speech element selecting step, and for correcting the acoustic feature parameter of the speech element based on the selected specified acoustic feature parameter, the acoustic feature parameter being corrected according to at least one predetermined equation wherein the corrected acoustic feature parameter is a function of at least a first target value for a parameter at a leading edge of said speech element and a second target value for a parameter at a trailing edge of said speech element, the corrected acoustic feature having a value equal to said first target value at said leading edge of said speech element and a value equal to said second target value at said trailing edge of said speech element; a time-series data generating step of concatenating plural acoustic feature parameters output from said parameter correction step to generate time series data of the acoustic feature parameters; and speech synthesizing means for uttering and outputting speech signals of the synthesized speech, corresponding to the input phoneme strings, in accordance with time-series data of acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating step.
-
-
12. A rule based speech synthesis apparatus comprising
speech element set storage means for storing a plurality of phoneme strings, each having a consonant phoneme on a boundary thereof, as a speech element, along with feature parameters, as a speech element set; -
speech element selection means for reading out acoustic feature parameters of a corresponding speech element, from said speech element set storage means, based on input phoneme strings; target parameter storage means having stored therein a representative acoustic feature parameter from one consonant to another; parameter correction means for reading out a target parameter for a consonant from said target parameter storage means having stored therein target parameters comprising acoustic parameters from one consonant to another, responsive to the acoustic feature parameters of the speech element, output from said speech element selection means, and for correcting the acoustic feature parameters of said speech element based on said target parameters, the acoustic feature parameters being corrected according to at least one predetermined equation wherein the corrected acoustic feature parameters are a function of at least a first target value for a parameter at a leading edge of said speech element and a second target value for a parameter at a trailing edge of said speech element, the corrected acoustic feature having a value equal to said first target value at said leading edge of said speech element and a value equal to said second target value at said trailing edge of said speech element; time-series data generating means for concatenating plural acoustic feature parameters output from said parameter correction means to generate time series data of the acoustic feature parameters; and speech synthesizing means for uttering and outputting speech signals of synthesized speech corresponding to the input phoneme strings in accordance with time-series data of the acoustic feature parameters, corresponding to the input phoneme strings, generated by said time-series data generating means.
-
-
13. A rule based speech synthesis method of using a processor to perform steps comprising
a speech element selecting step of reading out acoustic feature parameters of a corresponding speech element, based on an input phoneme string from a speech element set storage adapted for storing a plurality of phoneme strings, each having a consonant phoneme on the boundary, as a speech element, along with feature parameters, as a speech element set; -
a parameter correction step of reading out a target parameter for a consonant, responsive to the acoustic feature parameters of the speech element output in said speech element selecting step from the target parameter storage having stored therein target parameters comprising acoustic parameters from one consonant to another, and for correcting the acoustic feature parameters of said speech element based on said target parameter, the acoustic feature parameters being corrected according to at least one predetermined equation wherein the corrected acoustic feature parameters are a function of at least a first target value for a parameter at a leading edge of said speech element and a second target value for a parameter at a trailing edge of said speech element, the corrected acoustic feature having a value equal to said first target value at said leading edge of said speech element and a value equal to said second target value at said trailing edge of said speech element; a time series data generating step of generating time series data of the acoustic feature parameters by concatenating the acoustic feature parameters output from said parameter correction step; and a speech synthesis step of uttering and outputting a speech signal of synthesized speech, corresponding to said input phoneme strings, accordance with the time series data of the acoustic feature parameters, corresponding to said input phoneme strings, generated in said time series data generating step.
-
Specification