Training and applying prosody models
First Claim
Patent Images
1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- building an inventory of prosody models with designated characteristics;
selecting a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations;
training the target prosody models based on the prosody annotations of the first text segment;
maintaining associations between the first keywords, the designated characteristics, and the input characteristics;
selecting multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text and the associations;
applying the multiple application prosody models to the second text segment;
reconciling conflicts from the application of the multiple application prosody models to generate reconciled prosody information; and
generating audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine.
0 Assignments
0 Petitions
Accused Products
Abstract
Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
-
Citations
13 Claims
-
1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
building an inventory of prosody models with designated characteristics; selecting a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations; training the target prosody models based on the prosody annotations of the first text segment; maintaining associations between the first keywords, the designated characteristics, and the input characteristics; selecting multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text and the associations; applying the multiple application prosody models to the second text segment; reconciling conflicts from the application of the multiple application prosody models to generate reconciled prosody information; and generating audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for synthesizing audible speech, with varying prosody, from textual content, the system comprising:
-
a non-transitory memory for storing instructions; instructions stored on the non-transitory memory, the instructions executable on a processor to; build an inventory of prosody models with designated characteristics; select a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations; train the target prosody models based on the prosody annotations of the first text segment; use associations between the first keywords, the designated characteristics, and the input characteristics; select multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text segment and the associations; apply the multiple application prosody models to the second text; reconcile conflicts from the application of the multiple application prosody models to generate reconciled prosody information and a speech synthesis engine that generates audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
Specification