Automatic segmentation in speech synthesis
First Claim
Patent Images
1. In a system that concatenates speech units to produce synthetic speech, a method for automatically segmenting unit labels, the method comprising:
- training a set of hidden Markov Models (HMMs) using seed data in a first iteration;
aligning the set of HMMs using a Viterbi alignment to produce segmented unit labels; and
adjusting boundaries of the unit labels using spectral boundary correction,wherein the unit labels having adjusted boundaries are used to concatenate speech units to synthesize speech.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
-
Citations
24 Claims
-
1. In a system that concatenates speech units to produce synthetic speech, a method for automatically segmenting unit labels, the method comprising:
-
training a set of hidden Markov Models (HMMs) using seed data in a first iteration; aligning the set of HMMs using a Viterbi alignment to produce segmented unit labels; and adjusting boundaries of the unit labels using spectral boundary correction, wherein the unit labels having adjusted boundaries are used to concatenate speech units to synthesize speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. In a system having a speech inventory that includes phone labels that are concatenated to from synthetic speech, a method for segmenting the phone labels, the method comprising:
-
performing a first alignment on a trained set of HMM to produce phone labels that are segmented, wherein each phone label has a spectral boundary; and performing spectral boundary correction on the phone labels, wherein spectral boundary correction re-aligns each boundary using bending points of spectral transitions, wherein the phone labels having spectral boundary correction are used for speech synthesis. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for segmenting phone labels to reduce misalignments in order to improve synthetic speech when the phone labels are concatenated, the method comprising:
-
training a set of HMMs using one of a specific speaker'"'"'s hand-labeled speech data and speaker-independent speech data; segmenting the trained set of HMMs using a first alignment to produce phone labels, wherein each phone label has a spectral boundary; using a weighted slope metric to identify bending points of spectral transitions, where each bending point corresponds to a spectral boundary; and correcting a particular spectral boundary of a particular phone label if the particular spectral boundary does not coincide with a particular bending point, wherein the phone labels with corrected spectral boundaries are used for speech synthesis. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification