Training And Applying Prosody Models
First Claim
Patent Images
1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- generating a first text annotated with prosody information using a speech recognition engine;
training a prosody model with the first text annotated with prosody information;
applying the prosody model to textual content to produce a second text annotated with prosody information; and
synthesizing audible speech from the second text annotated with prosody information.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
50 Citations
14 Claims
-
1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
-
generating a first text annotated with prosody information using a speech recognition engine; training a prosody model with the first text annotated with prosody information; applying the prosody model to textual content to produce a second text annotated with prosody information; and synthesizing audible speech from the second text annotated with prosody information. - View Dependent Claims (2, 3, 4)
-
-
5. A method for synthesizing audible speech with varying prosody from textual content, the method comprising:
-
generating first text annotated with prosody information using a speech recognition engine; training multiple prosody models with the first text annotated with prosody information; selecting a prosody model based on textual content; applying the prosody model to textual content to produce a second text annotated with prosody information; and synthesizing audible speech from said second text annotated with prosody information.
-
-
6. A system for generating audible speech from content, the system comprising a processor, a data bus coupled to the processor, and computer-usable medium embodying computer program code and coupled to the data bus, the computer program code operable to
generate a first text annotated with prosody information using a speech recognition engine; -
train a prosody model with the first text annotated with prosody information; apply the prosody model to textual content to produce a second text annotated with prosody information; and synthesize audible speech from the second text annotated with prosody information. - View Dependent Claims (7, 8, 9)
-
-
10. A computer-usable medium embodying computer program code, the computer program code comprising computer executable instructions configured to
generate a first text annotated with prosody information using a speech recognition engine; -
train a prosody model with the first text annotated with prosody information; apply the prosody model to textual content to produce a second text annotated with prosody information; and synthesize audible speech from the second text annotated with prosody information. - View Dependent Claims (11, 12, 13)
-
-
14. A computer-usable medium embodying computer program code, the computer program code comprising computer executable instructions configured to
generate first text annotated with prosody information using a speech recognition engine; -
train multiple prosody models with the first text annotated with prosody information; select a prosody model based on textual content; apply the prosody model to textual content to produce a second text annotated with prosody information; and synthesize audible speech from said second text annotated with prosody information.
-
Specification