Training and applying prosody models
First Claim
Patent Images
1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- generating texts annotated with prosody information generated from audio using a speech recognition engine that performs the annotation during its operation;
training prosody models with lexicons based on first segments of the texts with the prosody information;
maintaining an inventory of the prosody models with lexicons,selecting a subset of multiple prosody models from the inventory of prosody models;
associating prosody models in the subset of multiple prosody models with second segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models;
applying the associated prosody models to one of the second segments of the text to produce prosody annotations for the text;
updating the associated prosody models'"'"' lexicons based on the phrases in the second segments of text;
analyzing annotations of the prosody annotations to reconcile conflicting prosody annotations previously produced by multiple prosody models associated with the second segments of text; and
synthesizing audible speech from the second segments of text and the reconciled prosody annotations.
0 Assignments
0 Petitions
Accused Products
Abstract
Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
71 Citations
16 Claims
-
1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
generating texts annotated with prosody information generated from audio using a speech recognition engine that performs the annotation during its operation; training prosody models with lexicons based on first segments of the texts with the prosody information; maintaining an inventory of the prosody models with lexicons, selecting a subset of multiple prosody models from the inventory of prosody models; associating prosody models in the subset of multiple prosody models with second segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models; applying the associated prosody models to one of the second segments of the text to produce prosody annotations for the text; updating the associated prosody models'"'"' lexicons based on the phrases in the second segments of text; analyzing annotations of the prosody annotations to reconcile conflicting prosody annotations previously produced by multiple prosody models associated with the second segments of text; and synthesizing audible speech from the second segments of text and the reconciled prosody annotations. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
selecting multiple prosody models for first segments of text based on input parameters; training the prosody models based on prosody annotations of the first segments of text; building lexicons of phrases statistically associated with the selection of the prosody models; analyzing prosody model applicability for second segments of text based on the lexicons and the second segments of text; applying applicable multiple prosody models to one of the second segments of text; reconciling conflicts from the application of multiple prosody models to one of the second segments of text to generate reconciled prosody information; generating audible speech for the one of the second segments of text based on the reconciled prosody information using a text-to-speech synthesis engines. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
generating first texts annotated with prosody information, the first texts and prosody information generated by a speech recognition engine applied to speech inputs; training an inventory of prosody models with the first texts annotated with prosody information, wherein the prosody models are associated with the speech inputs; selecting a subset of multiple prosody models from the inventory of prosody models; associating prosody models in the subset of multiple prosody models with different segments of a second text; applying the associated prosody models to the different segments of the second text to produce prosody annotations for the second text; and reconciling conflicting prosody annotations from multiple prosody models associated with a segment of the second text.
-
-
16. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
selecting a subset of multiple prosody models from an inventory of prosody models; associating prosody models in the subset of multiple prosody models with different segments of a text based on phrases in the text statistically associated with lexicons of the prosody models; applying the associated prosody models to the different segments of the text to produce prosody annotations of the text; and considering annotations of the prosody annotations to reconcile conflicting prosody annotations of the text previously produced by multiple prosody models associated with a segment of the text; and synthesizing audible speech from the text and the reconciled prosody annotations.
-
Specification