SPEECH SYNTHESIS SYSTEM, SPEECH SYNTHESIS PROGRAM PRODUCT, AND SPEECH SYNTHESIS METHOD
First Claim
1. A speech synthesis system for synthesizing speech from text, comprising:
- a speech segment database for storing data of speech segments having prosody information;
means for entering a text to be speech-synthesized;
means for determining a speech segment sequence corresponding to the input text from the speech segment database so as to minimize a cost including at least a frequency slope likelihood cost on the basis of a statistical model of prosody variations;
means for determining prosody modification values so as to minimize a cost including at least the frequency slope likelihood cost and a prosody modification cost on the basis of the statistical model of prosody variations regarding the determined speech segment sequence; and
means for applying the determined prosody modification values to the determined speech segment sequence.
8 Assignments
0 Petitions
Accused Products
Abstract
It is an objective of the present invention to provide waveform concatenation speech synthesis with high sound quality utilizing its advantages in the case where there is a large quantity of speech segments while providing waveform concatenation speech synthesis with accurate accents in other cases. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. In the preferred embodiment of the present invention, an accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.
-
Citations
15 Claims
-
1. A speech synthesis system for synthesizing speech from text, comprising:
-
a speech segment database for storing data of speech segments having prosody information; means for entering a text to be speech-synthesized; means for determining a speech segment sequence corresponding to the input text from the speech segment database so as to minimize a cost including at least a frequency slope likelihood cost on the basis of a statistical model of prosody variations; means for determining prosody modification values so as to minimize a cost including at least the frequency slope likelihood cost and a prosody modification cost on the basis of the statistical model of prosody variations regarding the determined speech segment sequence; and means for applying the determined prosody modification values to the determined speech segment sequence. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A speech synthesis program product which causes a system for synthesizing speech from text, the system storing a speech segment database which holds data of speech segments having prosody information, to perform the steps of:
-
entering the text to be speech-synthesized; determining a speech segment sequence corresponding to the input text from the speech segment database so as to minimize a cost including at least a frequency slope likelihood cost on the basis of a statistical model of prosody variations; determining prosody modification values so as to minimize a cost including at least the frequency slope likelihood cost and a prosody modification cost on the basis of the statistical model of prosody variations regarding the determined speech segment sequence; and applying the determined prosody modification values to the determined speech segment sequence. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A speech synthesis method for synthesizing speech from text by computer processing, comprising the steps of:
-
entering the text to be speech-synthesized; determining a speech segment sequence corresponding to the input text from a speech segment database including speech segment data having prosody information so as to minimize a cost including at least a frequency slope likelihood cost on the basis of a statistical model of prosody variations; determining prosody modification values so as to minimize a cost including at least the frequency slope likelihood cost and a prosody modification cost on the basis of the statistical model of prosody variations regarding the determined speech segment sequence; and applying the determined prosody modification values to the determined speech segment sequence. - View Dependent Claims (12, 13, 14, 15)
-
Specification