Training and applying prosody models
First Claim
Patent Images
1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- generating first texts annotated with prosody information, the first texts and prosody information generated by a speech recognition engine applied to speech inputs and parameters;
training an inventory of prosody models with the first texts annotated with prosody information, wherein the prosody models are associated with the parameters;
selecting a subset of multiple prosody models from the inventory of prosody models;
associating prosody models in the subset of multiple prosody models with different segments of a second text;
applying the associated prosody models to the different segments of the second text to produce prosody annotations for the second text;
reconciling conflicting prosody annotations from multiple prosody models associated with a segment of the second text; and
synthesizing audible speech from the second text and the reconciled prosody annotations.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
25 Citations
12 Claims
-
1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
generating first texts annotated with prosody information, the first texts and prosody information generated by a speech recognition engine applied to speech inputs and parameters; training an inventory of prosody models with the first texts annotated with prosody information, wherein the prosody models are associated with the parameters; selecting a subset of multiple prosody models from the inventory of prosody models; associating prosody models in the subset of multiple prosody models with different segments of a second text; applying the associated prosody models to the different segments of the second text to produce prosody annotations for the second text; reconciling conflicting prosody annotations from multiple prosody models associated with a segment of the second text; and synthesizing audible speech from the second text and the reconciled prosody annotations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
selecting multiple prosody models from a prosody model inventory based on keywords in a first text annotated with prosody information and parameters; training the multiple prosody models with the first text annotated with prosody information; choosing a subset of multiple prosody models from the prosody model inventory; associating prosody models in the subset of multiple prosody models with different segments of a second text; applying the associated prosody models to the different segments of the second text to produce prosody annotations for the second text; reconciling conflicting prosody annotations for a segment of the second text based on confidence levels and prosody model weights; and synthesizing audible speech from the second text and the reconciled prosody annotations. - View Dependent Claims (10, 11, 12)
-
Specification