AUTOMATIC SEGMENTATION IN SPEECH SYNTHESIS
First Claim
Patent Images
1. A system that concatenates speech units to produce synthetic speech, the system comprising:
- a module configured to train a set of Hidden Markov Models (HMMs) using seed data in a first iteration;
a module configured to align the set of HMMs to produce segmented unit labels; and
a module configured to adjust boundaries of the unit labels using spectral boundary correction, wherein the unit labels having adjusted boundaries are used to concatenate speech units to synthesize speech.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
16 Citations
21 Claims
-
1. A system that concatenates speech units to produce synthetic speech, the system comprising:
-
a module configured to train a set of Hidden Markov Models (HMMs) using seed data in a first iteration;
a module configured to align the set of HMMs to produce segmented unit labels; and
a module configured to adjust boundaries of the unit labels using spectral boundary correction, wherein the unit labels having adjusted boundaries are used to concatenate speech units to synthesize speech. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system having a speech inventory that includes phone labels that are concatenated to form synthetic speech, the system comprising:
-
a module configured to perform a first alignment on a trained set of HMMs to produce phone labels that are segmented, wherein each phone label has a spectral boundary;
a module configured to perform spectral boundary correction on the phone labels, wherein spectral boundary correction re-aligns each spectral boundary using bending points of spectral transitions; and
a module configured to synthesize speech using the phone labels having spectral boundary correction. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computing device that segments phone labels to reduce misalignments in order to improve synthetic speech when the phone labels are concatenated, the computing device comprising:
-
a module configured to train a set of HMMs using one of a specific speaker'"'"'s hand-labeled speech data and speaker-independent speech data;
a module configured to segment the trained set of HMMs using a first alignment to produce phone labels, wherein each phone label has a spectral boundary;
a module configured to use a weighted slope metric to identify bending points of spectral transitions, wherein each bending point corresponds to a spectral boundary, a module configured to correct a particular spectral boundary of a particular phone label if the particular spectral boundary does not coincide with a particular bending point; and
a module configured to synthesize speech using the phone labels with corrected spectral boundaries. - View Dependent Claims (18, 19, 20, 21)
-
Specification