Method of controlling high-speed reading in a text-to-speech conversion system
First Claim
1. A method of controlling highspeed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a sound quality coefficient determination unit that has a sound quality conversion coefficient table for changing said voice segment to switch sound quality and selects from said sound quality conversion coefficient table such a coefficient that sound quality does not change when a user-designated utterance speed exceeds a threshold.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of high-speed reading in a text-to-speech conversion system including a text analysis module (101) for generating a phoneme and prosody character string from an input text; a prosody generation module (102) for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string; and a speech generation module (103) for generating a synthetic waveform by waveform superimposition by referring to a voice segment dictionary (105). The prosody generation module is provided with both a duration rule table containing empirically found phoneme durations and a duration prediction table containing phoneme durations predicted by statistical analysis and, when the user-designated utterance speed exceeds a threshold, uses the duration rule table and, when the threshold is not exceeded, uses the duration prediction table to determined the phoneme duration.
29 Citations
16 Claims
-
1. A method of controlling highspeed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a sound quality coefficient determination unit that has a sound quality conversion coefficient table for changing said voice segment to switch sound quality and selects from said sound quality conversion coefficient table such a coefficient that sound quality does not change when a user-designated utterance speed exceeds a threshold. - View Dependent Claims (2)
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
-
3. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, phoneme duration, and fundamental frequency for the phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with both a pitch contour correction unit for outputting a pitch contour corrected according to an intonation level designated by the user and a switch for determining whether a base pitch is added to said pitch contour corrected according to said user-designated utterance speed, said switch being controlled not to change the base pitch when the utterance speed exceeds a threshold. - View Dependent Claims (4, 5, 7)
- a prosody generation module for generating a synthesis parameter of at least a voice segment, phoneme duration, and fundamental frequency for the phoneme and prosody character string;
-
6. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for said phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition while referring to said voice segment dictionary,said method comprising the step of providing said speech generation module with signal sound generation means for inserting a signal sound between sentences to indicate an end of a sentence when a user-designated utterance speed exceeds a threshold.
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for said phoneme and prosody character string;
-
8. A method of controlling highspeed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a phoneme duration determination unit for performing a process in which when a user-designated utterance speed exceeds a threshold, an utterance speed of at least a leading word in a sentence is returned to a normal utterance speed. - View Dependent Claims (9, 10)
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
-
11. A method of controlling high-speed reading in a text-to-speech conversion system, comprising:
-
inputting a text into the text-to-speech conversion system; generating a phoneme and prosody character string of the text with a text analysis module; creating a duration rule table containing a first phoneme duration obtained empirically; creating a duration prediction table containing a second phoneme duration obtained through statistical analysis; designating an utterance speed; determining a threshold value; comparing the utterance speed with the threshold value; selecting one of the duration rule table and the duration prediction table according to the utterance speed; determining a third phoneme duration with a phoneme duration determination unit according to the one of the duration rule table and the duration prediction table; generating a synthesis parameter of at least a voice segment, the third phoneme duration, and a fundamental frequency of the phoneme and prosody character string with a prosody generation module; and generating a synthetic waveform through waveform superimposition with a speech generation module according to the synthesis parameter and a voice segment dictionary containing a voice segment as a basic source of voice. - View Dependent Claims (12, 13)
-
-
14. A method of controlling high-speed reading in a text-to-speech conversion system, comprising:
-
inputting a text into the text-to-speech conversion system; generating a phoneme and prosody character string of the text with a text analysis module; creating a rule table containing first data of accent and phrase components obtained empirically; creating a prediction table containing second data of accent and phrase components obtained through statistical analysis; designating an utterance speed; determining a threshold value; comparing the utterance speed with the threshold value; selecting one of the rule table and the prediction table according to the utterance speed; determining a pitch contour with a pitch contour determination unit according to the one of the rule table and the prediction table; generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency of the phoneme and prosody character string with a prosody generation module; and generating a synthetic waveform through waveform superimposition with a speech generation module according to the synthesis parameter and a voice segment dictionary containing a voice segment as a basic source of voice. - View Dependent Claims (15, 16)
-
Specification