SPEECH SYNTHESIS SYSTEM, SPEECH SYNTHESIS PROGRAM PRODUCT, AND SPEECH SYNTHESIS METHOD
First Claim
1. At least one computer-readable storage device encoded with a speech synthesis program which causes a system for synthesizing speech from text to perform:
- determining a first speech segment sequence corresponding to an input text, by selecting speech segments from a speech segment database according to a first cost calculated based at least in part on a statistical model of prosody variations;
determining prosody modification values for the first speech segment sequence, after the first speech segment sequence is selected, by using a second cost calculated based at least in part on the statistical model of prosody variations, wherein the first cost is different from the second cost; and
applying the determined prosody modification values to the first speech segment sequence to produce a second speech segment sequence whose prosodic characteristics are different from prosodic characteristics of the first speech segment sequence.
8 Assignments
0 Petitions
Accused Products
Abstract
Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.
93 Citations
18 Claims
-
1. At least one computer-readable storage device encoded with a speech synthesis program which causes a system for synthesizing speech from text to perform:
-
determining a first speech segment sequence corresponding to an input text, by selecting speech segments from a speech segment database according to a first cost calculated based at least in part on a statistical model of prosody variations; determining prosody modification values for the first speech segment sequence, after the first speech segment sequence is selected, by using a second cost calculated based at least in part on the statistical model of prosody variations, wherein the first cost is different from the second cost; and applying the determined prosody modification values to the first speech segment sequence to produce a second speech segment sequence whose prosodic characteristics are different from prosodic characteristics of the first speech segment sequence. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A speech synthesis method for synthesizing speech from text by computer processing, the method comprising:
-
determining a first speech segment sequence corresponding to an input text by selecting speech segments from a speech segment database according to a first cost calculated based at least in part on a statistical model of prosody variations; determining prosody modification values for the first speech segment sequence, after the first speech segment sequence is selected, by using a second cost calculated based at least in part on the statistical model of prosody variations, wherein the first cost is different from the second cost; and applying the determined prosody modification values to the first speech segment sequence to produce a second speech segment sequence whose prosodic characteristics are different from prosodic characteristics of the first speech segment sequence. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A speech synthesis system for synthesizing speech from text, the system comprising:
at least one processor configured to; select a first speech segment sequence corresponding to an input text from a speech segment database by using a first cost value calculated based at least in part on a statistical model of prosody variations; determine prosody modification values for the first speech segment sequence, after the first speech segment sequence is selected, by using a second cost value calculated based at least in part on the statistical model of prosody variations, wherein the first cost value is different from the second cost value; and apply the determined prosody modification values to the first speech segment sequence to produce a second speech segment sequence whose prosodic characteristics are different from prosodic characteristics of the first speech segment sequence. - View Dependent Claims (14, 15, 16, 17, 18)
Specification