Trajectory Tiling Approach for Text-to-Speech
First Claim
1. A computer-readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
- obtaining a set of Hidden Markov Models (HMMs) and a set of waveform units from a speech corpus;
refining the set of HMMs via minimum generation error (MGE) training to generate a refined set of HMMs;
generating a speech parameter trajectory by applying the refined set of HMMs to an input text;
constructing a unit lattice of candidate waveform units selected from the set of waveform units based at least on the speech parameter trajectory;
performing a normalized cross-correlation (NCC)-based search on the unit lattice to obtain a minimal concatenation cost sequence of candidate waveform units; and
concatenating the minimum concatenation cost sequence of candidate waveform units into a concatenated waveform sequence.
2 Assignments
0 Petitions
Accused Products
Abstract
Hidden Markov Models HMM trajectory tiling (HTT)-based approaches may be used to synthesize speech from text. In operation, a set of Hidden Markov Models (HMMs) and a set of waveform units may be obtained from a speech corpus. The set of HMMs are further refined via minimum generation error (MGE) training to generate a refined set of HMMs. Subsequently, a speech parameter trajectory may be generated by applying the refined set of HMMs to an input text. A unit lattice of candidate waveform units may be selected from the set of waveform units based at least on the speech parameter trajectory. A normalized cross-correlation (NCC)-based search on the unit lattice may be performed to obtain a minimal concatenation cost sequence of candidate waveform units, which are concatenated into a concatenated waveform sequence that is synthesized into speech.
57 Citations
20 Claims
-
1. A computer-readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
-
obtaining a set of Hidden Markov Models (HMMs) and a set of waveform units from a speech corpus; refining the set of HMMs via minimum generation error (MGE) training to generate a refined set of HMMs; generating a speech parameter trajectory by applying the refined set of HMMs to an input text; constructing a unit lattice of candidate waveform units selected from the set of waveform units based at least on the speech parameter trajectory; performing a normalized cross-correlation (NCC)-based search on the unit lattice to obtain a minimal concatenation cost sequence of candidate waveform units; and concatenating the minimum concatenation cost sequence of candidate waveform units into a concatenated waveform sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer implemented method, comprising:
-
under control of one or more computing systems configured with executable instructions, obtaining a set of Hidden Markov Models (HMMs) and an initial set of waveform units from a speech corpus, each waveform unit in the initial set having a first time length; generating a speech parameter trajectory by applying the set of HMMs to an input text; constructing a unit lattice of candidate waveform units selected from the initial set of waveform units based at least on the speech parameter trajectory; performing a normalized cross-correlation (NCC)-based search on the unit lattice to search for a sequence of candidate waveform units along a minimum concatenation cost path; concatenating the sequence of candidate waveform units into a concatenated waveform sequence when the sequence of waveform units is found along the minimum concatenation cost path; and generating a modified set of waveform units from the speech corpus when no sequence of candidate waveform units is found along the minimum concatenation cost path, each waveform unit in the modified set having a second time length that is less than the first time length. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
one or more processors; and a memory that includes a plurality of computer-executable components, the plurality of computer-executable components comprising; a Hidden Markov Model (HMM) component to obtain a set of HMMs from a speech corpus; a refinement component to refine the set of HMMs via minimum generation error (MGE) training to generate a refined set of HMMs; and a trajectory generation component to generate a speech parameter trajectory by applying the refined set of HMMs to an input text. - View Dependent Claims (17, 18, 19, 20)
-
Specification