Method and apparatus for controlling a speech synthesis system to provide multiple styles of speech
First Claim
1. A method for synthesizing a voice signal based on a predetermined voice control information stream, the voice signal selectively synthesized to have a particular prosodic style, the method comprising the steps of:
- analyzing said predetermined voice control information stream to identify one or more portions thereof for prosody control;
selecting one or more prosody control templates based on the particular prosodic style selected for said voice signal synthesis;
applying said one or more selected prosody control templates to said one or more identified portions of said predetermined voice control information stream, thereby generating a stylized voice control information stream; and
synthesizing said voice signal based on said stylized voice control information stream so that said synthesized voice signal has said particular prosodic style.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for synthesizing speech from text whereby the speech may be generated in a manner so as to effectively convey a particular, selectable style. Repeated patterns of one or more prosodic features—such as, for example, pitch, amplitude, spectral tilt, and/or duration—occurring at characteristic locations in the synthesized speech, are advantageously used to convey a particular chosen style. For example, one or more of such feature patterns may be used to define a particular speaking style, and an illustrative text-to-speech system then makes use of such a defined style to adjust the specified parameter or parameters of the synthesized speech in a non-uniform manner (i.e., in accordance with the defined feature pattern or patterns).
192 Citations
20 Claims
-
1. A method for synthesizing a voice signal based on a predetermined voice control information stream, the voice signal selectively synthesized to have a particular prosodic style, the method comprising the steps of:
-
analyzing said predetermined voice control information stream to identify one or more portions thereof for prosody control;
selecting one or more prosody control templates based on the particular prosodic style selected for said voice signal synthesis;
applying said one or more selected prosody control templates to said one or more identified portions of said predetermined voice control information stream, thereby generating a stylized voice control information stream; and
synthesizing said voice signal based on said stylized voice control information stream so that said synthesized voice signal has said particular prosodic style. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for synthesizing a voice signal based on a predetermined voice control information stream, the voice signal selectively synthesized to have a particular prosodic style, the apparatus comprising:
-
means for analyzing said predetermined voice control information stream to identify one or more portions thereof for prosody control;
means for selecting one or more prosody control templates based on the particular prosodic style selected for said voice signal synthesis;
means for applying said one or more selected prosody control templates to said one or more identified portions of said predetermined voice control information stream, thereby generating a stylized voice control information stream; and
means for synthesizing said voice signal based on said stylized voice control information stream so that said synthesized voice signal has said particular prosodic style. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification