Training and Applying Prosody Models
0 Assignments
0 Petitions
Accused Products
Abstract
Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
50 Citations
34 Claims
-
1-14. -14. (canceled)
-
15. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
generating texts annotated with prosody information generated from audio using a speech recognition engine that performs the annotation during its operation; training prosody models with lexicons based on first segments of the texts with the prosody information; maintaining an inventory of the prosody models with lexicons, selecting a subset of multiple prosody models from the inventory of prosody models; associating prosody models in the subset of multiple prosody models with second segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models; applying the associated prosody models to one of the second segments of the text to produce prosody annotations for the text; updating the associated prosody models'"'"' lexicons based on the phrases in the second segments of text; analyzing annotations of the prosody annotations to reconcile conflicting prosody annotations previously produced by multiple prosody models associated with the second segments of text; and synthesizing audible speech from the second segments of text and the reconciled prosody annotations. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
selecting multiple prosody models for first segments of text based on input parameters; training the prosody modules based on prosody annotations of the first segments of text; building lexicons of phrases statistically associated with the selection of the prosody models; analyzing prosody model applicability for second segments of text based on the lexicons and the second segments of text; applying applicable multiple prosody models to one of the second segments of text; reconciling conflicts from the application of multiple prosody models to one of the second segments of text to generate reconciled prosody information; generating audible speech for the one of the second segments of text based on the reconciled prosody information using a text-to-speech synthesis engines. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
-
29. A system for synthesizing audible speech, with varying prosody, from textual content, the system operable to:
-
generate texts annotated with prosody information generated from audio using a speech recognition engine that performs the annotation during its operation; train prosody models with lexicons based on first segments of the texts with the prosody information; maintain an inventory of the prosody models with lexicons, select a subset of multiple prosody models from the inventory of prosody models; associate prosody models in the subset of multiple prosody models with second segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models; apply the associated prosody models to the second segments of the text to produce prosody annotations for the text; update the associated prosody models'"'"' lexicons based on the phrases in the second segments of text; analyze annotations of the prosody annotations to reconcile conflicting prosody annotations previously produced by multiple prosody models associated with the second segments of the text; and synthesize audible speech from the text and the reconciled prosody annotations. - View Dependent Claims (30, 31, 32, 33, 34)
-
Specification