Text-to-speech conversion system for interlocking with multimedia and a method for organizing input data of the same
First Claim
Patent Images
1. A text-to-speech conversion system for interlocking with multimedia comprising;
- a multimedia information input unit for organizing text, prosody information, information on synchronization with a moving picture, lip-shape information, picture information, and individual property information including a gender, age, accent, pronunciation and speech rate of synthesized speech;
a data distributor for distributing the information from said multimedia information input unit into information for each media;
a language processor for converting the text distributed by said data distributor into a phoneme stream, presuming prosody information and symbolizing the presumed prosody information;
a prosody processor for calculating a prosody control parameter value from the symbolized prosody information from the language processor;
a synchronization adjuster for adjusting a duration of each phoneme using the synchronization information distributed by said data distributor;
a synthesis unit database for receiving the individual property information from said data distributor, selecting synthesis units adaptable to gender and age and outputting data required for synthesis;
a signal processor for producing a synthesized speech using the prosody control parameter and the data output from said synthesis unit database; and
a picture output apparatus for outputting the picture information distributed by said data distributor onto a screen.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a text-to-speech conversion system (TTS) for interlocking with multimedia and a method for organizing input data of the TTS which can enhance the natural of synthesized speech and accomplish the synchronization of multimedia with TTS by defining additional prosody information, the information required to interlock TTS with multimedia, and interface between these information and TTS for use in the production of the synthesized speech.
-
Citations
13 Claims
-
1. A text-to-speech conversion system for interlocking with multimedia comprising;
-
a multimedia information input unit for organizing text, prosody information, information on synchronization with a moving picture, lip-shape information, picture information, and individual property information including a gender, age, accent, pronunciation and speech rate of synthesized speech; a data distributor for distributing the information from said multimedia information input unit into information for each media; a language processor for converting the text distributed by said data distributor into a phoneme stream, presuming prosody information and symbolizing the presumed prosody information; a prosody processor for calculating a prosody control parameter value from the symbolized prosody information from the language processor; a synchronization adjuster for adjusting a duration of each phoneme using the synchronization information distributed by said data distributor; a synthesis unit database for receiving the individual property information from said data distributor, selecting synthesis units adaptable to gender and age and outputting data required for synthesis; a signal processor for producing a synthesized speech using the prosody control parameter and the data output from said synthesis unit database; and a picture output apparatus for outputting the picture information distributed by said data distributor onto a screen.
-
-
2. A method for organizing input data of a text-to-speech conversion system for interlocking with multimedia, said method comprising the steps of:
-
(a) classifying multimedia input information organized for enhancing natural synthesized speech and implementing synchronization of multimedia with text-to-speech into text, prosody information, information on synchronization with a moving picture, lip-shaped information, picture information, and individual property information using a multimedia information input unit; (b) distributing using a data distributor the multimedia input information classified in the multimedia information input unit based on respective information; (c) converting the text distributed by the data distributor into a phoneme stream, presuming prosody information and symbolizing the presumed prosody information using a language processor; (d) calculating a prosody control parameter value which is not included in the multimedia input information using a prosody processor; (e) adjusting a duration of each phoneme using a synchronization adjuster so as to synchronize a processing result of the prosody processor with a picture signal according to the synchronization information distributed by the data distributor; (f) selecting synthesis units adaptable to gender and age based on the individual property information from the data distributor using a synthesis unit database and outputting data required for synthesis; (g) producing synthesized speech using a signal processor based on the prosody information distributed by the data distributor, a processing result of the synchronization adjuster, and the data from the synthesis unit database; and (h) outputting the picture information distributed by the data distributor onto a screen using a picture output unit. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
Specification