Text-to-speech conversion system
First Claim
1. A text-to-speech conversion system comprising:
- a conversion processing unit for converting inputted text into a synthesized speech waveform;
a phrase dictionary containing a plurality of sound-related terms that correspond to a plurality of waveform data generated from recorded sounds; and
a waveform dictionary containing the waveform data generated from the sound-related terms,wherein said conversion system outputs just the speech waveform synthesized in the conversion processing unit from the inputted text, except in a case where a term in the inputted text matches one of the terms registered in said phrase dictionary, whereupon said conversion system substitutes the waveform from the waveform dictionary based on the waveform data corresponding to the one matching sound-related term, and outputs just the waveform without any overlap with the synthesized speech waveform.
3 Assignments
0 Petitions
Accused Products
Abstract
The system according to the invention comprises a text-to-speech conversion processing unit, and a phrase dictionary as well as a waveform dictionary, connected independently from each other to the conversion processing unit. The conversion processing unit is for converting any Japanese text inputted from outside into speech. In the phrase dictionary, voice-related terms representing the reproduced sounds of actually recorded sounds, for example, notations of terms such as onomatopoeic words, background sounds, lyrics, music titles, and so forth, are previously registered. Further, in the waveform dictionary, waveform data obtained from the actually recorded sounds, corresponding to the voice-related terms, are previously registered. Furthermore, the conversion processing unit is constituted such that as for a term in the text matching the voice-related term registered in the phrase dictionary upon correlation of the former with the latter, actually recorded speech waveform data corresponding to the relevant voice-related term matching the term in the text, registered in the waveform dictionary, is outputted as a speech waveform of the term.
-
Citations
49 Claims
-
1. A text-to-speech conversion system comprising:
-
a conversion processing unit for converting inputted text into a synthesized speech waveform; a phrase dictionary containing a plurality of sound-related terms that correspond to a plurality of waveform data generated from recorded sounds; and a waveform dictionary containing the waveform data generated from the sound-related terms, wherein said conversion system outputs just the speech waveform synthesized in the conversion processing unit from the inputted text, except in a case where a term in the inputted text matches one of the terms registered in said phrase dictionary, whereupon said conversion system substitutes the waveform from the waveform dictionary based on the waveform data corresponding to the one matching sound-related term, and outputs just the waveform without any overlap with the synthesized speech waveform. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A text-to-speech conversion system comprising:
-
a conversion processing unit for converting inputted text into a synthesized speech waveform; a phrase dictionary containing a plurality of sound-related terms that correspond to a plurality of waveform data generated from recorded sounds; and a waveform dictionary containing the waveform data generated from the sound-related terms, wherein said conversion system outputs just the speech waveform synthesized in the conversion processing unit from the inputted text, except in the case where there is a match between a term in the inputted text and one of the sound-related terms registered in said phrase dictionary, whereupon said conversion system overlaps the waveform based on the recorded waveform data corresponding to the one matching sound-related term and the speech waveform synthesized from the inputted text. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A text-to-speech conversion system comprising:
-
a conversion processing unit for converting a text inputted into a speech waveform; a phrase dictionary for registering a plurality of voice-related terms that correspond to a plurality of actual waveform data generated from actually recorded voices; and a waveform dictionary for registering the actual waveform data corresponding to the voice-related terms, wherein said conversion processing unit has a function such that in the case where there is a match between a term in the inputted text and one of the voice-related terms registered in said phrase dictionary, said conversion processing unit outputs an overlapped speech waveform including the speech waveform based on the actual waveform data corresponding to the one matching voice-related term and the speech waveform synthesized therein from the inputted text, and otherwise said conversion processing unit outputs the speech waveform synthesized therein from the inputted text; wherein said conversion processing unit has a function of adjusting the time length of the waveform data read out from said waveform dictionary; and wherein in case the time length of the read-out waveform data is longer than that of the speech waveform synthesized from the inputted text, the time length of the read-out waveform data is adjusted by truncating said waveform data at a time when said speech waveform comes to an end.
-
-
29. A text-to-speech conversion system comprising:
-
a conversion processing unit for converting a text inputted into a speech waveform; a phrase dictionary for registering a plurality of voice-related terms that correspond to a plurality of actual waveform data generated from actually recorded voices; and a waveform dictionary for registering the actual waveform data corresponding to the voice-related terms, wherein said conversion processing unit has a function such that in the case where there is a match between a term in the inputted text and one of the voice-related terms registered in said phrase dictionary, said conversion processing unit outputs an overlapped speech waveform including the speech waveform based on the actual waveform data corresponding to the one matching voice-related term and the speech waveform synthesized therein from the inputted text, and otherwise said conversion processing unit outputs the speech waveform synthesized therein from the inputted text; wherein said conversion processing unit has a function of adjusting the time length of the waveform data read out from said waveform dictionary; and wherein in case the time length of the read-out waveform data is longer than that of the speech waveform synthesized from the inputted text, said time length is adjusted by gradually attenuating the sound volume of said waveform data so as to become zero at a time when said speech waveform comes to an end.
-
-
30. A text-to-speech conversion system comprising:
-
a conversion processing unit for converting a text inputted into a speech waveform; a phrase dictionary for registering a plurality of voice-related terms that correspond to a plurality of actual waveform data generated from actually recorded voices; and a waveform dictionary for registering the actual waveform data corresponding to the voice-related terms, wherein said conversion processing unit has a function such that in the case where there is a match between a term in the inputted text and one of the voice-related terms registered in said phrase dictionary, said conversion processing unit outputs an overlapped speech waveform including the speech waveform based on the actual waveform data corresponding to the one matching voice-related term and the speech waveform synthesized therein from the inputted text, and otherwise said conversion processing unit outputs the speech waveform synthesized therein from the inputted text; wherein said conversion processing unit has a function of adjusting the time length of the waveform data read out from said waveform dictionary; and wherein in case the time length of the read-out waveform data is shorter than that of the speech waveform synthesized from the inputted text, said time length is adjusted by coupling together successive repetitions of said waveform data.
-
-
31. A text-to-speech conversion system comprising:
-
a conversion processing unit for converting inputted text, containing lyrics, into a synthesized speech and song waveform; a song phrase dictionary containing a plurality of pairs of lyrics or lyric phrases and song phoneme rhythm symbol strings corresponding thereto; and a song phoneme rhythm symbol string processing unit for analyzing the song phoneme rhythm symbol strings in order to convert said song phoneme rhythm symbol strings into a plurality of synthesized song/speech waveforms, wherein said conversion processing unit outputs just the speech waveform synthesized therein from the inputted text, except in a case where one of the lyrics in the inputted text matches with one of the lyrics registered in said song phrase dictionary, whereupon said conversion processing unit outputs just the synthesized song/speech waveforms, without overlapping said speech waveform. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38)
-
-
39. A text-to-speech conversion system comprising:
-
a conversion processing unit for converting inputted text containing a music title into a synthesized speech waveform; a music title dictionary containing a plurality of music titles; and a musical sound waveform generator for generating a musical sound waveform corresponding to one of the music titles, said musical sound waveform generator including a music dictionary for registering music data corresponding to the music titles, and a musical sound synthesizer for converting one of the music data into a musical sound waveform, wherein said conversion processing unit outputs just the speech waveform synthesized therein from the inputted text, except in a case where the music title in the inputted text matches one of the registered music titles, whereupon the musical sound waveform corresponding to the one matching registered music title is superimposed on the speech waveform of the text before being outputted. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47)
-
-
48. A text-to-speech conversion system comprising:
-
a conversion processing unit for converting inputted text containing a music title into a speech waveform; a music title dictionary for registering a plurality of music titles; and a musical sound waveform generator for generating a musical sound waveform corresponding to one of the music titles, said musical sound waveform generator including a music dictionary for registering music data corresponding to the music titles, and a musical sound synthesizer for converting one of the music data into a musical sound waveform, wherein said conversion processing unit has a function such that in a case where the music title in the inputted text matches one of the registered music titles, the musical sound waveform corresponding to the one matching registered music title is superimposed on the speech waveform of the text before being outputted; wherein said conversion processing unit has a function of adjusting the time length of the musical sound waveform sent from said musical sound synthesizer; and wherein in case the time length of the musical sound waveform differs from the time length of the speech waveform of the text, the time length of the superimposed output is adjusted to be the longer of both the waveform time lengths.
-
-
49. A text-to-speech conversion system comprising:
-
a conversion processing unit for converting inputted text containing a music title into a speech waveform; a music title dictionary for registering a plurality of music titles; and a musical sound waveform generator for generating a musical sound waveform corresponding to one of the music titles, said musical sound waveform generator including a music dictionary for registering music data corresponding to the music titles, and a musical sound synthesizer for converting one of the music data into a musical sound waveform, wherein said conversion processing unit has a function such that in a case where the music title in the inputted text matches one of the registered music titles, the musical sound waveform corresponding to the one matching registered music title is superimposed on the speech waveform of the text before being outputted; wherein said conversion processing unit has a function of adjusting the time length of the musical sound waveform sent from said musical sound synthesizer; and wherein in case the time length of the musical sound waveform is shorter than that of the speech waveform of the inputted text, said time length of the musical sound waveform is adjusted by coupling together successive repetitions of said musical sound waveform data.
-
Specification