Automatic segmentation in speech synthesis
First Claim
1. In a system that concatenates speech units to produce synthetic speech, a method for automatically segmenting unit labels, the method comprising:
- training a set of Hidden Markov Models (HMMs) using seed data in a first iteration;
aligning the set of HMMs using a Viterbi alignment to produce segmented unit labels; and
adjusting boundaries of the unit labels using spectral boundary correction.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
21 Citations
24 Claims
-
1. In a system that concatenates speech units to produce synthetic speech, a method for automatically segmenting unit labels, the method comprising:
-
training a set of Hidden Markov Models (HMMs) using seed data in a first iteration;
aligning the set of HMMs using a Viterbi alignment to produce segmented unit labels; and
adjusting boundaries of the unit labels using spectral boundary correction. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. In a system having a speech inventory that includes phone labels that are concatenated to form synthetic speech, a method for segmenting the phone labels, the method comprising:
-
performing a first alignment on a trained set of HMMs to produce phone labels that are segmented, wherein each phone label has a spectral boundary; and
performing spectral boundary correction on the phone labels, wherein spectral boundary correction re-aligns each spectral boundary using bending points of spectral transitions. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for segmenting phone labels to reduce misalignments in order to improve synthetic speech when the phone labels are concatenated, the method comprising:
-
training a set of HMMs using one of a specific speaker'"'"'s hand-labeled speech data and speaker-independent speech data;
segmenting the trained set of HMMs using a first alignment to produce phone labels, wherein each phone label has a spectral boundary; and
using a weighted slope metric to identify bending points of spectral transitions, wherein each bending point corresponds to a spectral boundary; and
correcting a particular spectral boundary of a particular phone label if the particular spectral boundary does not coincide with a particular bending point. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification