Speech synthesis method, speech synthesis system, and speech synthesis program
First Claim
1. A speech synthesis method comprising:
- selecting a plurality of speech units from a group of speech units, based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech;
generating a new speech unit corresponding to the each of segments, by fusing speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and
generating synthetic speech by concatenating the new speech units.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech synthesis system stores a group of speech units in a memory, selects a plurality of speech units from the group based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech and minimizing distortion of synthetic speech generated from the speech units selected to the target speech, generates a new speech unit corresponding to the each of the segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively, and generates synthetic speech by concatenating the new speech units.
-
Citations
19 Claims
-
1. A speech synthesis method comprising:
-
selecting a plurality of speech units from a group of speech units, based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech;
generating a new speech unit corresponding to the each of segments, by fusing speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and
generating synthetic speech by concatenating the new speech units. - View Dependent Claims (2, 3, 5, 7, 8, 9, 10, 11)
-
-
4. The speech synthesis method for generating synthetic speech by concatenating speech units selected from a first group of speech units based on a phoneme string and prosodic information of target speech, the method comprising:
-
storing a second group of speech units and environmental information items corresponding to the second group respectively in a memory;
selecting a plurality of speech units from the second group based on each of training environmental information items corresponding to training speech units respectively, the speech units selected whose environmental information items being similar to the each of the training environmental information items; and
generating each of speech units of the first group,-by fusing the speech units selected. - View Dependent Claims (6, 12, 13)
-
-
14. A speech synthesis system comprising:
-
a memory to store a group of speech units;
a selecting unit configured to select a plurality of speech units from the group of speech units, based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech;
a first generating unit configured to generate a new speech unit corresponding to the each of segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and
a second generating unit configured to generate synthetic speech by concatenating the new speech units.
-
-
15. A speech synthesis system comprising:
-
a memory to store a first group of speech units, each of the speech units of the first group being generated by fusing a plurality of speech units whose environmental information items being similar to one of training environmental information items and are selected from a second group of speech units; and
a generating unit configured to generate synthetic speech by concatenating a plurality of speech units selected from the first group based on a phoneme string and prosodic information of target speech.
-
-
16. A computer program stored on a computer readable medium, the computer program comprising:
-
first program instruction means for instructing a computer processor to select a plurality of speech units from a first group of speech units stored in a first memory based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech;
second program instruction means for instructing a computer processor to generate a new speech unit corresponding to the each of segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and
third program instruction means for instructing a computer processor to generate synthetic speech by concatenating the new speech units. - View Dependent Claims (17)
-
-
18. A speech synthesis system comprising:
-
a memory to store a group of speech units;
a selecting unit configured to select a plurality of speech units from the group based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech and minimizing distortion of synthetic speech generated from the speech units selected to the target speech;
a first generating unit configured to generate a new speech unit corresponding to the each of the segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively; and
a second generating unit configured to generate synthetic speech by concatenating the new speech units. - View Dependent Claims (19)
-
Specification