Speech synthesis system by rule using phonemes as systhesis units
First Claim
1. A speech synthesis system comprising:
- code converter means (22) for accepting at an input terminal (21) text code comprising spelling, accent code and intonation code of a word, and producing therefrom a phonetic symbol for pronunciation (phoneme of speech) including a text string and aprosodic string for each phoneme of speech;
a feature vector table (24) including means for storing feature vector information comprising speech parameters for each phoneme, including a time duration period, pitch frequency pattern, formant frequency, formant bandwidth, strength of a voice source, and speech rate,wherein each of said speech parameters is defined by two target points (r1 and r2) during said time duration period, a value at each of the target points, and a connection curve between said two target point values,and wherein said said speech rate is defined for each phoneme by parameters of a speech rate adjustment curve including a start point (d1), an end point (d2) and a ratio of adjustment, stored in said feature vector table (24);
feature vector selection means (23) for selecting an address of said feature vector table (24) in accordance with each phonetic symbol input thereto from said code converter means (22);
a speech rate table generator means (25) for calculating, in response to speech rate parameters stored in said address selected from said feature vector table (24) by said selection means (23), a relationship between relative time which defines a speech parameter and absolute time, according to said speech rate adjustment curve;
a speech rate table (26) for storing the output of said speech rate table generator means (25) for successive short increments of time defined by said generator means (25);
speech synthesizing parameter calculation means (27) for calculating, from feature vector information stored in said feature vector table (24) and speech rate information stored in said speech rate table (26), an instant value of a speech parameter at each increment of time defined in said speech rate table (26);
speech synthesizer means (28) including voice sources and filters for generating a synthesized voice output by actuating voice source and filter combinations according to said speech parameter values calculated by said speech synthesizer parameter calculation means (27); and
an output terminal (29) coupled with an output of said speech synthesizer means (28) for providing said synthesized speech.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech synthesizer that synthesizes speech by actuating a voice source and a filter which processes output of the voice source according to speech parameters in each successive short interval of time according to feature vectors which include formant frequencies, formant bandwidth, speech rate and so on. Each feature vector, or speech parameter is defined by two target points (r1, r2), and a value at each target point together with a connection curve between target points. A speech rate is defined by a speech rate curve which defines elongation or shortening of the speech rate, by start point (d1) of elongation (or shorteninng), end point (d2), and elongation ratio between d1 and d2. The ratios between the relative time of each speech parameter and absolute time are preliminarily calculated according to the speech rate table in each predetermined short interval.
-
Citations
4 Claims
-
1. A speech synthesis system comprising:
-
code converter means (22) for accepting at an input terminal (21) text code comprising spelling, accent code and intonation code of a word, and producing therefrom a phonetic symbol for pronunciation (phoneme of speech) including a text string and aprosodic string for each phoneme of speech; a feature vector table (24) including means for storing feature vector information comprising speech parameters for each phoneme, including a time duration period, pitch frequency pattern, formant frequency, formant bandwidth, strength of a voice source, and speech rate, wherein each of said speech parameters is defined by two target points (r1 and r2) during said time duration period, a value at each of the target points, and a connection curve between said two target point values, and wherein said said speech rate is defined for each phoneme by parameters of a speech rate adjustment curve including a start point (d1), an end point (d2) and a ratio of adjustment, stored in said feature vector table (24); feature vector selection means (23) for selecting an address of said feature vector table (24) in accordance with each phonetic symbol input thereto from said code converter means (22); a speech rate table generator means (25) for calculating, in response to speech rate parameters stored in said address selected from said feature vector table (24) by said selection means (23), a relationship between relative time which defines a speech parameter and absolute time, according to said speech rate adjustment curve; a speech rate table (26) for storing the output of said speech rate table generator means (25) for successive short increments of time defined by said generator means (25); speech synthesizing parameter calculation means (27) for calculating, from feature vector information stored in said feature vector table (24) and speech rate information stored in said speech rate table (26), an instant value of a speech parameter at each increment of time defined in said speech rate table (26); speech synthesizer means (28) including voice sources and filters for generating a synthesized voice output by actuating voice source and filter combinations according to said speech parameter values calculated by said speech synthesizer parameter calculation means (27); and an output terminal (29) coupled with an output of said speech synthesizer means (28) for providing said synthesized speech. - View Dependent Claims (2, 3, 4)
-
Specification