SPEECH SYNTHESIS APPARATUS AND METHOD
First Claim
1. An apparatus for synthesizing speech, comprising:
- a speech unit corpus configured to store a group of speech units;
a selection unit configured to divide a phoneme sequence of target speech into a plurality of segments, and to select a combination of speech units for each segment from the speech unit corpus;
an estimation unit configured to estimate a distortion between the target speech and synthesized speech generated by fusing each speech unit of the combination for each segment;
wherein the selection unit recursively selects the combination of speech units for each segment based on the distortion,a fusion unit configured to generate a new speech unit for each segment by fusing each speech unit of the combination selected for each segment; and
a concatenation unit configured to generate synthesized speech by concatenating the new speech unit for each segment.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech unit corpus stores a group of speech units. A selection unit divides a phoneme sequence of target speech into a plurality of segments, and selects a combination of speech units for each segment from the speech unit corpus. An estimation unit estimates a distortion between the target speech and synthesized speech generated by fusing each speech unit of the combination for each segment. The selection unit recursively selects the combination of speech units for each segment based on the distortion. A fusion unit generates a new speech unit for each segment by fusing each speech unit of the combination selected for each segment. A concatenation unit generates synthesized speech by concatenating the new speech unit for each segment.
-
Citations
20 Claims
-
1. An apparatus for synthesizing speech, comprising:
-
a speech unit corpus configured to store a group of speech units; a selection unit configured to divide a phoneme sequence of target speech into a plurality of segments, and to select a combination of speech units for each segment from the speech unit corpus; an estimation unit configured to estimate a distortion between the target speech and synthesized speech generated by fusing each speech unit of the combination for each segment; wherein the selection unit recursively selects the combination of speech units for each segment based on the distortion, a fusion unit configured to generate a new speech unit for each segment by fusing each speech unit of the combination selected for each segment; and a concatenation unit configured to generate synthesized speech by concatenating the new speech unit for each segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for synthesizing speech, comprising:
-
storing a group of speech units; dividing a phoneme sequence of target speech into a plurality of segments; selecting a combination of speech units for each segment from the group of speech units; estimating a distortion between the target speech and synthesized speech generated by fusing each speech unit of the combination for each segment; recursively selecting the combination of speech units for each segment based on the distortion; generating a new speech unit for each segment by fusing each speech unit of the combination selected for each segment; and generating synthesized speech by concatenating the new speech unit for each segment.
-
-
20. A computer program product, comprising:
-
a computer readable program code embodied in said product for causing a computer to synthesize speech, said computer readable program code comprising; a first program code to store a group of speech units; a second program code to divide a phoneme sequence of target speech into a plurality of segments; a third program code to select a combination of speech units for each segment from the group of speech units; a fourth program code to estimate a distortion between the target speech and synthesized speech generated by fusing each speech unit of the combination for each segment; a fifth program code to recursively select the combination of speech units for each segment based on the distortion; a sixth program code to generate a new speech unit for each segment by fusing each speech unit of the combination selected for each segment; and a seventh program code to generate synthesized speech by concatenating the new speech unit for each segment.
-
Specification