Text-to-speech synthesizer having formant-rule and speech-parameter synthesis modes
First Claim
1. A text-to-speech synthesizer comprising:
- analyzer means for decomposing a sequence of input characters into phoneme components and classifying the decomposed phoneme components as a first group of phoneme components if each phoneme component is to be synthesized by a speech parameter and classifying said phoneme components as a second group of phoneme components if each phoneme component is to be synthesized by a formant rule;
first memory means for storing speech parameters derived from natural human speech, said speech parameters corresponding to the phoneme components of said first group and being retrievable from said first memory means in response to each of the phoneme components of the first group;
second memory means for storing formant rules for generating formant transition patterns, said formant rules corresponding to the phoneme components of said second group and being retrievable from said second memory means in response to each of the phoneme components of the second group;
means for retrieving a speech parameter from said first memory means in response to one of the phoneme components of the first group;
means for retrieving a formant rule from said second memory means in response to one of said phoneme components of the second group and deriving a formant transition pattern from the retrieved formant rule;
parameter converter means for converting a formant of said derived formant transition pattern into a corresponding speech parameter; and
speech synthesizer means for synthesizing a human speech utterance from the speech parameter retrieved from said first memory means and synthesizing a human speech utterance from the speech parameter converted by said parameter converter means,wherein said speech parameters stored in said first memory means are represented by auto-regressive (AR) parameters, and said formant of said derived formant transition patterns are represented by frequency and bandwidth values, wherein said parameter converter means comprises;
means for converting the frequency value of said formant into a value equal to C=cos(2π
F/fs), where F is said frequency value and fs represents a sampling frequency, and converting the bandwidth value of said formant into a value equal to R=exp(-π
B/fs), where B is the bandwidth value;
means for generating a first signal representative of a value 2×
C×
R and a second signal representative of a value R2 ;
unit impulse generator for generating a unit impulse; and
a series of second-order transversal filters connected in series from said unit impulse generator to said speech synthesizer means, each of said second-order transversal filters including a tapped delay line, first and second tap-weight multipliers connected respectively to successive taps of said tapped delay line, and an adder for summing the outputs of said multipliers with said unit impulse, said first and second multipliers multiplying signals at said successive taps with said first and second signals, respectively.
1 Assignment
0 Petitions
Accused Products
Abstract
A text-to-speech synthesizer comprises an analyzer that decomposes a sequence of input characters into phoneme components and classifies them as a first group of phoneme components or a second group if they are to be synthesized by a speech parameter or by a formant rule, respectively. Speech parameters derived from natural human speech are stored in first memory locations corresponding to the phoneme components of the first group and the stored speech parameters are recalled from the first memory in response to each of the phoneme components of the first group. Formant rules capable of generating formant transition patterns are stored in second memory locations corresponding to the phoneme components of the second group, the formant rules being recalled from the second memory in response to each of the phoneme components of the second group. Formant transition patterns are derived from the formant rule recalled from the second memory, and formants of the derived transition patterns are converted into corresponding speech parameters. Spoken words are digitally synthesized from the speech parameters recalled from the first memory as well as from those supplied from the converted speech parameters.
-
Citations
6 Claims
-
1. A text-to-speech synthesizer comprising:
-
analyzer means for decomposing a sequence of input characters into phoneme components and classifying the decomposed phoneme components as a first group of phoneme components if each phoneme component is to be synthesized by a speech parameter and classifying said phoneme components as a second group of phoneme components if each phoneme component is to be synthesized by a formant rule; first memory means for storing speech parameters derived from natural human speech, said speech parameters corresponding to the phoneme components of said first group and being retrievable from said first memory means in response to each of the phoneme components of the first group; second memory means for storing formant rules for generating formant transition patterns, said formant rules corresponding to the phoneme components of said second group and being retrievable from said second memory means in response to each of the phoneme components of the second group; means for retrieving a speech parameter from said first memory means in response to one of the phoneme components of the first group; means for retrieving a formant rule from said second memory means in response to one of said phoneme components of the second group and deriving a formant transition pattern from the retrieved formant rule; parameter converter means for converting a formant of said derived formant transition pattern into a corresponding speech parameter; and speech synthesizer means for synthesizing a human speech utterance from the speech parameter retrieved from said first memory means and synthesizing a human speech utterance from the speech parameter converted by said parameter converter means, wherein said speech parameters stored in said first memory means are represented by auto-regressive (AR) parameters, and said formant of said derived formant transition patterns are represented by frequency and bandwidth values, wherein said parameter converter means comprises; means for converting the frequency value of said formant into a value equal to C=cos(2π
F/fs), where F is said frequency value and fs represents a sampling frequency, and converting the bandwidth value of said formant into a value equal to R=exp(-π
B/fs), where B is the bandwidth value;means for generating a first signal representative of a value 2×
C×
R and a second signal representative of a value R2 ;unit impulse generator for generating a unit impulse; and a series of second-order transversal filters connected in series from said unit impulse generator to said speech synthesizer means, each of said second-order transversal filters including a tapped delay line, first and second tap-weight multipliers connected respectively to successive taps of said tapped delay line, and an adder for summing the outputs of said multipliers with said unit impulse, said first and second multipliers multiplying signals at said successive taps with said first and second signals, respectively. - View Dependent Claims (2, 3)
-
-
4. A text-to-speech synthesizer comprising:
-
analyzer means for decomposing a sequence of input characters into phoneme components and classifying the decomposed phoneme components as a first group of phoneme components if each phoneme component is to be synthesized by a speech parameter and classifying said phoneme components as a second group of phoneme components if each phoneme component is to be synthesized by a formant rule; first memory means for storing speech parameters derived from natural human speech, said speech parameters corresponding to the phoneme components of said first group and being retrievable from said first memory means in response to each of the phoneme components of the first group; second memory means for storing formant rules for generating formant transition patterns, said formant rules corresponding to the phoneme components of said second group and being retrievable from said second memory means in response to each of the phoneme components of the second group; means for retrieving a speech parameter from said first memory means in response to one of the phoneme components of the first group; means for retrieving a formant rule from said second memory means in response to one of said phoneme components of the second group and deriving a formant transition pattern from the retrieved formant rule; parameter converter means for converting a formant of said derived formant transition pattern into a corresponding speech parameter; and speech synthesizer means for synthesizing a human speech utterance from the speech parameter retrieved from said first memory means and synthesizing a human speech utterance from the speech parameter converted by said parameter converter means, wherein said speech parameters in said first memory means are represented by auto-regressive (AR) parameters and auto-negressive moving average (ARMA) parameters, and said formant rules in said second memory means being further capable of generating antiformant transition patterns, each of said formants and said antiformants being represented by frequency and bandwidth values, wherein said parameter converter means comprises; means for converting the frequency value of said formant into a value equal to C=cos(2π
F/fs), where F is said frequency value and fs represents a sampling frequency, and converting the bandwidth value of said formant into a value equal to R=exp(-π
B/fs), where B is the bandwidth value;means for generating a first signal representative of a value 2×
C×
R and a second signal representative of a value R2 ;unit impulse generator means for generating a unit impulse; and a series of second-order transversal filters connected in series from said unit impulse generator to said speech synthesizer means, each of said second-order transversal filters including a tapped delay line, first and second tap-weight multipliers connected respectively to successive taps of said tapped delay line, and an adder for summing the outputs of said multipliers with said unit impulse, said first and second multipliers multiplying signals at said successive taps with said first and second signals, respectively. - View Dependent Claims (5, 6)
-
Specification