Method and apparatus for speech synthesis without prosody modification
First Claim
1. A method of selecting speech segments for concatenative speech synthesis, the method comprising:
- parsing an input text into speech units;
identifying context information for each speech unit based on its location in the input text and at least one neighboring speech unit;
identifying a set of candidate speech segments for each speech unit based on the context information through steps comprising applying the context information for a speech unit to a decision tree to identify a leaf node containing candidate speech segments for the speech unit; and
identifying a sequence of speech segments from the candidate speech segments based in part on a smoothness cost between the speech segments.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech synthesizer is provided that concatenates stored samples of speech units without modifying the prosody of the samples. The present invention is able to achieve a high level of naturalness in synthesized speech with a carefully designed training speech corpus by storing samples based on the prosodic and phonetic context in which they occur. In particular, some embodiments of the present invention limit the training text to those sentences that will produce the most frequent sets of prosodic contexts for each speech unit. Further embodiments of the present invention also provide a multi-tier selection mechanism for selecting a set of samples that will produce the most natural sounding speech.
-
Citations
4 Claims
-
1. A method of selecting speech segments for concatenative speech synthesis, the method comprising:
-
parsing an input text into speech units; identifying context information for each speech unit based on its location in the input text and at least one neighboring speech unit; identifying a set of candidate speech segments for each speech unit based on the context information through steps comprising applying the context information for a speech unit to a decision tree to identify a leaf node containing candidate speech segments for the speech unit; and identifying a sequence of speech segments from the candidate speech segments based in part on a smoothness cost between the speech segments. - View Dependent Claims (2, 3, 4)
-
Specification