Speech synthesis method utilizing auxiliary information, medium recorded thereon the method and apparatus utilizing the method
First Claim
1. A text speech synthesis method by rule which synthesizes arbitrary speech through the use of an input text, said method comprising the steps of:
- (a) analyzing said input text by reference to a word dictionary and identifying a sequence of words in said input text to obtain a sequence of phonemes of each word;
(b) setting a fundamental frequency, a power and a phoneme duration specified for each phoneme of said each word as first prosodic parameters on the basis of said word dictionary;
(c) selecting from a speech waveform dictionary phoneme waveforms corresponding to said phonemes in said each word to thereby generate a sequence of phoneme waveforms;
(d) extracting a fundamental frequency, a speech power and a phoneme duration as second prosodic parameters from input actual speech;
(e) selecting at least one of said first prosodic parameters or at least one of said second prosodic parameters as a selected prosodic parameter; and
(f) generating synthesized speech by controlling said sequence of phoneme waveforms with said selected prosodic parameter.
1 Assignment
0 Petitions
Accused Products
Abstract
In a method and apparatus which use actual speech as auxiliary information and synthesize speech by speech synthesis by rule, prosodic information for a phoneme sequence of each word of a word sequence obtained by an analysis of an input text is set by referring to a word dictionary, and a speech waveform sequence is obtained from the phoneme sequence of each word by referring to a speech waveform dictionary. Additional prosodic information is extracted from input actual speech, and at least one of the set prosodic information or at least one of the extracted prosodic information is selected and used to control the speech waveform sequence to create synthesized speech.
76 Citations
20 Claims
-
1. A text speech synthesis method by rule which synthesizes arbitrary speech through the use of an input text, said method comprising the steps of:
-
(a) analyzing said input text by reference to a word dictionary and identifying a sequence of words in said input text to obtain a sequence of phonemes of each word; (b) setting a fundamental frequency, a power and a phoneme duration specified for each phoneme of said each word as first prosodic parameters on the basis of said word dictionary; (c) selecting from a speech waveform dictionary phoneme waveforms corresponding to said phonemes in said each word to thereby generate a sequence of phoneme waveforms; (d) extracting a fundamental frequency, a speech power and a phoneme duration as second prosodic parameters from input actual speech; (e) selecting at least one of said first prosodic parameters or at least one of said second prosodic parameters as a selected prosodic parameter; and (f) generating synthesized speech by controlling said sequence of phoneme waveforms with said selected prosodic parameter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A speech synthesizer for synthesizing speech corresponding to input text by speech synthesis by rule, said synthesizer comprising:
-
text analysis means for sequentially identifying a sequence of words forming said input text by reference to a word dictionary to thereby obtain a sequence of phonemes of each word; prosodic parameter setting means for setting first prosodic parameters for each phoneme in said each word that is set in said word dictionary in association with said each word, said prosodic parameter setting means including fundamental frequency setting means, speech power setting means and duration setting means for setting, respectively, a fundamental frequency, speech power and duration of each phoneme as said first prosodic parameters for said each word provided in said word dictionary in association with said each word; speech segment select means for selectively reading out of a speech waveform dictionary a speech waveform corresponding to said each phoneme in each of said identified words; prosodic parameter extracting means for extracting second prosodic parameters from input actual speech, said prosdic parameter extracting means including fundamental frequency extracting means, speech power extracting means and duration extracting means for extracting, respectively, a fundamental frequency, a speech power and a phoneme duration as said second prosodic parameters from said input actual speech through a fixed analysis window at a regular time interval; prosodic parameter select means for selecting at least one of said first prosodic parameters or at least one of said second prosodic parameters as a selected prosodic parameter; and speech synthesizing means for controlling said selected speech waveform by said selected prosodic parameters and for outputting said synthesized speech. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A recording medium which has recorded thereon a procedure for synthesizing arbitrary speech by rule from an input text, said procedure comprising the steps of:
-
(a) analyzing said input text by reference to a word dictionary and identifying a sequence of words in said input text to obtain a sequence of phonemes of each word; (b) setting first prosodic parameters for each of said phonemes in said each word; (c) selecting from a speech waveform dictionary phoneme waveforms corresponding to said phonemes in said each word to thereby generate a sequence of phoneme waveforms; (d) extracting a fundamental frequency, a speech power and a phoneme duration from input actual speech as second prosodic parameters; (e) selecting at least one of said first prosodic parameters or at least one of said second prosodic parameters as a selected prosodic parameter; and (f) generating synthesized speech by controlling said sequence of phoneme waveforms with said selected prosodic parameters. - View Dependent Claims (18, 19, 20)
-
Specification