Training and Applying Prosody Models
0 Assignments
0 Petitions
Accused Products
Abstract
Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
-
Citations
27 Claims
-
1-14. -14. (canceled)
-
15. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
building an inventory of prosody models with designated characteristics; selecting a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations; training the target prosody models based on the prosody annotations of the first text segment; maintaining associations between the first keywords, the designated characteristics, and the input characteristics; selecting multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text and the associations; applying the multiple application prosody models to the second text segment; reconciling conflicts from the application of the multiple application prosody models to generate reconciled prosody information; and generating audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A system for synthesizing audible speech, with varying prosody, from textual content, the system operable to:
-
build an inventory of prosody models with designated characteristics; select a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations; train the target prosody models based on the prosody annotations of the first text segment; maintain associations between the first keywords, the designated characteristics, and the input characteristics; select multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text segment and the associations; apply the multiple application prosody models to the second text; reconcile conflicts from the application of the multiple application prosody models to generate reconciled prosody information; and generate audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine. - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
Specification