Method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure
First Claim
1. A method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure, comprising the steps of:
- (A) segmenting speech stored in a speech corpus into at least one speech segment according to a unit type, wherein each speech segment has its prosody information;
(B) locating pitch marks for each speech segment;
(C) selecting one of the speech segment according to the unit type as a source segment and other speech segments as target segments, and performing a prosody alignment between the source segment and each target segment to obtain a prosody-aliened source segment, wherein the pitch marks of the prosody-aligned source segment are aligned with the pitch marks of the target segment;
(D) measuring distortion between the prosody-aligned source segment and each target segment to obtain a distance between the prosody-aligned source segment and each target segment, and to obtain an average distance between the prosody-aligned source segment and each target segment; and
(E) selecting at least one speech segment with a relative small average distance.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure is disclosed. This method is based on comparison of speech segments segmented from a speech corpus, wherein speech segments are fully prosody-aligned to each other before distortion measure. With prosody alignment embedded in selection process, distortion resulting from possible prosody modification in synthesis could be taken into account objectively in selection phase. In order to carry out the purpose of the present invention, automatic segmentation, pitch marking and PSOLA method work together for prosody alignment. Two distortion measures, MFCC and PSQM are used for comparing two prosody-aligned segments of speech because of human perceptual consideration.
11 Citations
11 Claims
-
1. A method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure, comprising the steps of:
-
(A) segmenting speech stored in a speech corpus into at least one speech segment according to a unit type, wherein each speech segment has its prosody information;
(B) locating pitch marks for each speech segment;
(C) selecting one of the speech segment according to the unit type as a source segment and other speech segments as target segments, and performing a prosody alignment between the source segment and each target segment to obtain a prosody-aliened source segment, wherein the pitch marks of the prosody-aligned source segment are aligned with the pitch marks of the target segment;
(D) measuring distortion between the prosody-aligned source segment and each target segment to obtain a distance between the prosody-aligned source segment and each target segment, and to obtain an average distance between the prosody-aligned source segment and each target segment; and
(E) selecting at least one speech segment with a relative small average distance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
Specification