Prosody template matching for text-to-speech systems
First Claim
1. A text-to-speech synthesizer system, comprising:
- a text input module receptive of target synthesis text;
a prosody module connected to the text input module for associating prosody information with the target synthesis text, the prosody module employing an n-way tree structure of plural traversal paths to identify the prosody information for the target synthesis text;
the prosody module having a prosody pattern lookup module that traverses said tree structure, the lookup module having a stored matrix of penalty values associated with said plural traversal oaths and being operative to identify the prosody information for the target synthesis text corresponding to the traversal path of lowest penalty value; and
a sound generation module connected to the prosody module for converting the target synthesis text to audible speech using the prosody information.
2 Assignments
0 Petitions
Accused Products
Abstract
A prosody matching template in the form of a tree structure stores indices which point to lookup table and template information prescribing pitch and duration values that are used to add inflection to the output of a text-to-speech synthesizer. The lookup module employs a search algorithm that explores each branch of the tree, assigning penalty scores based on whether the syllable represented by a node of the tree does or does not match the corresponding syllable of the target word. The path with the lowest penalty score is selected as the index into the prosody template table. The system will add nodes by cloning existing nodes in cases where it is not possible to find a one-to-one match between the number of syllables in the target word and the number of nodes in the tree.
-
Citations
22 Claims
-
1. A text-to-speech synthesizer system, comprising:
-
a text input module receptive of target synthesis text;
a prosody module connected to the text input module for associating prosody information with the target synthesis text, the prosody module employing an n-way tree structure of plural traversal paths to identify the prosody information for the target synthesis text;
the prosody module having a prosody pattern lookup module that traverses said tree structure, the lookup module having a stored matrix of penalty values associated with said plural traversal oaths and being operative to identify the prosody information for the target synthesis text corresponding to the traversal path of lowest penalty value; and
a sound generation module connected to the prosody module for converting the target synthesis text to audible speech using the prosody information. - View Dependent Claims (2, 3, 4, 5, 22)
-
-
6. A method for generating synthesized speech, comprising the steps of:
-
receiving an input text string;
employing an n-way tree structure to identify prosody information for the input text string, where the tree structure is based on stress patterns such that each node of the tree structure provides a stress level that may be associated with a syllabic portion of a text string;
the prosody module having a prosody pattern lookup module that traverses said tree structure, the lookup module having a stored matrix of penalty values associated with said plural traversal paths and being operative to identify the prosody information for the target synthesis text corresponding to the traversal Path of lowest penalty value; and
converting the input text string into audible speech using the prosody information. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method for generating prosody information for use in a text-to-speech synthesizer system, comprising the steps of:
-
receiving an input text string;
determining a pattern of prosodic features associated with the input text string;
identifying a first prosody template from a plurality of prosody templates by traversing an n-way tree structure in order to identify a matching pattern of prosodic features, where the tree structure is based on stress patterns and has plural nodes such that each node provides a stress level that may be associated with a syllabic portion of a text string, and where each prosody template represents a pattern of prosodic features that may be associated with a text string and the first prosody template having a pattern of prosodic features that correlate to the input text string;
cloning a portion of the first prosody template by cloning one of said nodes, when the pattern for the first prosody template is shorter than the pattern for the input text string; and
concatenating the replicated portion of the first prosody template onto the pattern of the first prosody template, thereby constructing a generated prosody template that more closely correlates to the input text string. - View Dependent Claims (18, 19, 20, 21)
-
Specification