Speech synthesis system and method
First Claim
1. A speech synthesis system for generating synthesized speech by segmenting a phonetic sequence derived from an input text by predetermined synthesis units, and by concatenation of representative speech units each of which is extracted from respective one of the synthesis units, the system comprising:
- a storage unit configured to store a plurality of speech units corresponding to the synthesis units;
a selector configured to select, with respect to each of the synthesis units of the phonetic sequence derived from the input text, a plurality of speech units from those stored in the storage unit based on a level of distortion of the synthesized speech;
a representative speech generator configured to generate the representative speech unit corresponding to the synthesis unit by calculating a statistics of power information from the speech units, and by correcting the power information based on the statistics of the power information to increase the synthesized speech in sound quality; and
a speech waveform generator configured to generate a speech waveform by concatenating the generated representative speech units.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech synthesis system in a preferred embodiment includes a speech unit storage section, a phonetic environment storage section, a phonetic sequence/prosodic information input section, a plural-speech-unit selection section, a fused-speech-unit sequence generation section, and a fused-speech-unit modification/concatenation section. By fusing a plurality of selected speech units in the fused speech unit sequence generation section, a fused speech unit is generated. In the fused speech unit sequence generation section, the average power information is calculated for a plurality of selected M speech units, N speech units are fused together, and the power information of the fused speech unit is so corrected as to be equalized with the average power information of the M speech units.
-
Citations
15 Claims
-
1. A speech synthesis system for generating synthesized speech by segmenting a phonetic sequence derived from an input text by predetermined synthesis units, and by concatenation of representative speech units each of which is extracted from respective one of the synthesis units, the system comprising:
-
a storage unit configured to store a plurality of speech units corresponding to the synthesis units;
a selector configured to select, with respect to each of the synthesis units of the phonetic sequence derived from the input text, a plurality of speech units from those stored in the storage unit based on a level of distortion of the synthesized speech;
a representative speech generator configured to generate the representative speech unit corresponding to the synthesis unit by calculating a statistics of power information from the speech units, and by correcting the power information based on the statistics of the power information to increase the synthesized speech in sound quality; and
a speech waveform generator configured to generate a speech waveform by concatenating the generated representative speech units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A speech synthesis method for generating a synthesized speech by segmenting a phonetic sequence derived from an input text by predetermined synthesis units, and by concatenating representative speech units each of which is extracted from respective one of the synthesis units, the method comprising the steps of:
-
storing a plurality of speech units corresponding to the synthesis unit;
selecting, with respect to each of the synthesis units of the phonetic sequence derived from the input text, a plurality of speech units from the speech units stored in the storage step based on a level of distortion of the synthesized speech;
generating the representative speech unit corresponding to the synthesis unit by calculating a statistics of power information from the speech units, and by correcting the power information based on the statistics of the power information to increase the synthesized speech in sound quality; and
generating a speech waveform by concatenation the generated representative speech unit.
-
-
15. A program for use with a computer to implement a speech synthesis method for generating a synthesized speech by segmenting a phonetic sequence derived from an input text by a predetermined synthesis unit, and by concatenation of representative speech units each of which is extracted from respective one of the synthesis units, the program implementing the functions of:
-
storing a plurality of speech units corresponding to the synthesis unit;
selecting, with respect to each of the synthesis units of the phonetic sequence derived from the input text, a plurality of speech units from the speech units stored in the storage function based on a level of distortion of the synthesized speech;
generating the representative speech unit corresponding to the synthesis unit by calculating a statistics of power information from the speech units, and by correcting the power information based on the statistics of the power information to increase the synthesized speech in sound quality; and
generating a speech waveform by concatenation the generated representative speech unit.
-
Specification