Method of controlling high-speed reading in a text-to-speech conversion system
First Claim
1. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for said phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary, said method comprising the step of providing said prosody generation module with a phoneme duration determination unit that includes both a duration rule table containing empirically found phoneme durations and a duration prediction table containing phoneme durations predicted by statistical analysis and determines a phoneme duration by using, when a user-designated utterance speed exceeds a threshold, said duration rule table and, when said threshold is not exceeded, said duration prediction table.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of high-speed reading in a text-to-speech conversion system including a text analysis module (101) for generating a phoneme and prosody character string from an input text; a prosody generation module (102) for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string; and a speech generation module (103) for generating a synthetic waveform by waveform superimposition by referring to a voice segment dictionary (105). The prosody generation module is provided with both a duration rule table containing empirically found phoneme durations and a duration prediction table containing phoneme durations predicted by statistical analysis and, when the user-designated utterance speed exceeds a threshold, uses the duration rule table and, when the threshold is not exceeded, uses the duration prediction table to determined the phoneme duration.
-
Citations
14 Claims
-
1. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for said phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a phoneme duration determination unit that includes both a duration rule table containing empirically found phoneme durations and a duration prediction table containing phoneme durations predicted by statistical analysis and determines a phoneme duration by using, when a user-designated utterance speed exceeds a threshold, said duration rule table and, when said threshold is not exceeded, said duration prediction table. - View Dependent Claims (2)
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for said phoneme and prosody character string;
-
3. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition while referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a pitch contour determination unit that has both an empirically found rule table and a prediction table predicted by statistical analysis and determines a pitch contour by determining both accent and phrase components with, when a user-designated utterance speed exceeds a threshold, said duration rule table and, when said threshold is not exceeded, said duration prediction table. - View Dependent Claims (4)
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
-
5. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a sound quality coefficient determination unit that has a sound quality conversion coefficient table for changing said voice segment to switch sound quality and selects from said sound quality conversion coefficient table such a coefficient that sound quality does not change when a user-designated utterance speed exceeds a threshold. - View Dependent Claims (6)
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
-
7. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, phoneme duration, and fundamental frequency for the phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with both a pitch contour correction unit for outputting a pitch contour corrected according to an intonation level designated by the user and a switch for determining whether a base pitch is added to said pitch contour corrected according to said user-designated utterance speed. - View Dependent Claims (8, 9)
- a prosody generation module for generating a synthesis parameter of at least a voice segment, phoneme duration, and fundamental frequency for the phoneme and prosody character string;
-
10. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for said phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition while referring to said voice segment dictionary,said method comprising the step of providing said speech generation module with signal sound generation means for inserting a signal sound between sentences to indicate an end of a sentence when a user-designated utterance speed exceeds a threshold. - View Dependent Claims (11, 13, 14)
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for said phoneme and prosody character string;
-
12. A method of controlling high-speed reading in a text-to-speech conversion system including a text analysis module for generating a phoneme and prosody character string from an input text;
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
a voice segment dictionary in which voice segments as a source of voice are registered; and
a speech generation module for generating a synthetic waveform by waveform superimposition by referring to said voice segment dictionary,said method comprising the step of providing said prosody generation module with a phoneme duration determination unit for performing a process in which when a user-designated utterance speed exceeds a threshold, an utterance speed of at least a leading word in a sentence is returned to a normal utterance speed.
- a prosody generation module for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string;
Specification