RICH CONTEXT MODELING FOR TEXT-TO-SPEECH ENGINES
First Claim
Patent Images
1. A computer readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
- refining a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models; and
generating synthesized speech for an input text based at least on some of the plurality of refined rich context models.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.
-
Citations
23 Claims
-
1. A computer readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
-
refining a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models; and generating synthesized speech for an input text based at least on some of the plurality of refined rich context models. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer implemented method, comprising:
-
under control of one or more computing systems configured with executable instructions, refining a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models; performing pre-selection to compose a rich context model candidate sausage for the input text, the candidate sausage including the plurality of refined rich context model sequences, each sequence including at least some refined rich context models from the plurality of refined rich context models; selecting one of the plurality of refined rich context model sequences that has a least divergence from a guiding sequence that is obtained from the decision tree-tied HMMs; and generating output speech for the input text based at least on the selected rich context model sequence. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A system, comprising:
-
one or more processors; a memory that includes a plurality of computer-executable components, the plurality of computer-executable components comprising; a training module to refine a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models; a pre-selection module to perform pre-selection to compose a rich context model candidate sausage for the input text, the candidate sausage including the plurality of refined rich context model sequences, each sequence including at least some refined rich context models from the plurality of refined rich context models; a unit pruning module to implement unit pruning along the candidate sausage to select one or more rich context model sequences with less than a predetermined amount of distortion from a guiding sequence, the guiding sequence obtained from the decision tree-tied HMMs; a cross correlation search module to conduct a normalized cross correlation-based search to derive a minimal concatenation cost rich context model sequence from the one or more rich context model sequences; a waveform concatenation module to concatenate waveform units of an input text along a path of the minimal concatenation cost rich context model sequence to generate a waveform sequence; and a synthesis module to generate synthesized speech for the input text based at least on the concatenated waveform sequence. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification