Speech synthesis method, speech synthesis system, and speech synthesis program
First Claim
Patent Images
1. A speech synthesis method comprising:
- storing a group of speech units in a memory;
segmenting a phoneme string of a target speech, to obtain a plurality of segments;
selecting, from the group in the memory, a speech unit for each of the segments based on prosodic information of the target speech, to obtain an optimal speech unit sequence including speech units selected for the respective segments;
selecting M (M represents a positive integer greater than one) speech units for each of the segments from the group in the memory, based on the optimal speech unit sequence;
generating a new speech unit corresponding to each of the segments, by fusing the M speech units selected for said each of the segments, to obtain a plurality of new speech units corresponding to the segments respectively; and
generating synthetic speech by concatenating the new speech units.
0 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesis system stores a group of speech units in a memory, selects a plurality of speech units from the group based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech and minimizing distortion of synthetic speech generated from the speech units selected to the target speech, generates a new speech unit corresponding to the each of the segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively, and generates synthetic speech by concatenating the new speech units.
-
Citations
10 Claims
-
1. A speech synthesis method comprising:
-
storing a group of speech units in a memory; segmenting a phoneme string of a target speech, to obtain a plurality of segments; selecting, from the group in the memory, a speech unit for each of the segments based on prosodic information of the target speech, to obtain an optimal speech unit sequence including speech units selected for the respective segments; selecting M (M represents a positive integer greater than one) speech units for each of the segments from the group in the memory, based on the optimal speech unit sequence; generating a new speech unit corresponding to each of the segments, by fusing the M speech units selected for said each of the segments, to obtain a plurality of new speech units corresponding to the segments respectively; and generating synthetic speech by concatenating the new speech units. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A speech synthesis system comprising:
-
a memory to store a group of speech units; a first selecting unit configured to select, from the group in the memory, a speech unit for each of segments which are obtained by segmenting a phoneme string of a target speech, based on prosodic information of the target speech, to obtain an optimal speech unit sequence including speech units selected for the respective segments; a second selecting unit configured to select, based on the optimal speech unit sequence, M (M represents a positive integer greater than one) speech units for each segment of the segments from the group in the memory; a first generating unit configured to generate a new speech unit corresponding to each segment of the segments, by fusing the M speech units selected for the segment, to obtain a plurality of new speech units corresponding to the segments respectively; and a second generating unit configured to generate synthetic speech by concatenating the new speech units.
-
-
9. A non-transitory computer readable medium storing program instructions which when executed by a computer results in performance of steps comprising:
-
selecting from a first group of speech units in a first memory, a speech unit per each of segments which are obtained by segmenting a phoneme string of a target speech, based on prosodic information of the target speech, to obtain an optimal speech unit sequence including speech units selected for the respective segments; selecting M (M represents a positive integer greater than one) speech units for each of the segments from the first group in the first memory, based on the optimal speech unit sequence; generating a new speech unit corresponding to each segment of the segments, by fusing the M speech units selected for the segment, to obtain a plurality of new speech units corresponding to the segments respectively; and generating synthetic speech by concatenating the new speech units. - View Dependent Claims (10)
-
Specification