Prosody template matching for text-to-speech systems
First Claim
1. A text-to-speech synthesizer system, comprising:
- a text input module receptive of target synthesis text;
a prosody module connected to the text input module for associating prosody information with the target synthesis text, the prosody module employing an n-way tree structure to identify the prosody information for the target synthesis text; and
a sound generation module connected to the prosody module for converting the target synthesis text to audible speech using the prosody information.
2 Assignments
0 Petitions
Accused Products
Abstract
A prosody matching template in the form of a tree structure stores indices which point to lookup table and template information prescribing pitch and duration values that are used to add inflection to the output of a text-to-speech synthesizer. The lookup module employs a search algorithm that explores each branch of the tree, assigning penalty scores based on whether the syllable represented by a node of the tree does or does not match the corresponding syllable of the target word. The path with the lowest penalty score is selected as the index into the prosody template table. The system will add nodes by cloning existing nodes in cases where it is not possible to find a one-to-one match between the number of syllables in the target word and the number of nodes in the tree.
-
Citations
22 Claims
-
1. A text-to-speech synthesizer system, comprising:
-
a text input module receptive of target synthesis text;
a prosody module connected to the text input module for associating prosody information with the target synthesis text, the prosody module employing an n-way tree structure to identify the prosody information for the target synthesis text; and
a sound generation module connected to the prosody module for converting the target synthesis text to audible speech using the prosody information. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for generating synthesized speech, comprising the steps of:
-
receiving an input text string;
employing an n-way tree structure to identify prosody information for the input text string, where the tree structure is based on stress patterns such that each node of the tree structure provides a stress level that may be associated with a syllabic portion of a text spring; and
converting the input text string into audible speech using the prosody information. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22)
-
-
17. A method for generating prosody information for use in a text-to-speech synthesizer system, comprising the steps of:
-
receiving an input text string;
determining a pattern of prosodic features associated with the input text string;
identifying a first prosody template from a plurality of prosody templates, where each prosody template represents a pattern of prosodic features that may be associated with a text string and the first prosody template having a pattern of prosodic features that correlate to the input text string;
replicating a portion of the first prosody template, when the pattern for the first prosody template is shorter than the pattern for the input text string; and
concatenating the replicated portion of the first prosody template onto the pattern of the first prosody template, thereby constructing a generated prosody template that more closely correlates to the input text string.
-
Specification