Converting text-to-speech and adjusting corpus
First Claim
1. A method for text to speech conversion, comprising:
- a text analysis step for parsing the text to obtain descriptive prosody annotations of the text based on a text to speech model generated from a first corpus;
a prosody parameter prediction step for predicting the prosody parameter of the text according to the result of text analysis step; and
a speech synthesis step for synthesizing speech of said text based on said predicted prosody parameter of the text;
Wherein descriptive prosody annotations of the text include prosody structure of the text, the prosody structure of the text is adjusted according to a target speech speed for the synthesized speech.
8 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a method and apparatus for text to speech conversion, and a method and apparatus for adjusting a corpus. The method for text to speech comprises: text analysis step for parsing the text to obtain descriptive prosody annotations of the text based on a TTS model generated from a first corpus; prosody parameter prediction step for predicting the prosody parameter of the text according to the result of text analysis step; speech synthesis step for synthesizing speech of said text based on said the prosody parameter of the text; wherein descriptive prosody annotations of the text include prosody structure for the text, the prosody structure of the text is adjusted according to a target speech speed for the synthesized speech. The present invention adjusts the prosody structure of the text according to the target speech speed. The synthesized speech will have improved quality.
242 Citations
36 Claims
-
1. A method for text to speech conversion, comprising:
-
a text analysis step for parsing the text to obtain descriptive prosody annotations of the text based on a text to speech model generated from a first corpus; a prosody parameter prediction step for predicting the prosody parameter of the text according to the result of text analysis step; and a speech synthesis step for synthesizing speech of said text based on said predicted prosody parameter of the text; Wherein descriptive prosody annotations of the text include prosody structure of the text, the prosody structure of the text is adjusted according to a target speech speed for the synthesized speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 33)
-
-
16. An apparatus for text to speech conversion, comprising:
-
text analysis means for parsing the text to obtain descriptive prosody annotations of the text based on a text to speech model generated from a first corpus, said descriptive prosody annotations of the text include prosody structure of the text; prosody parameter prediction means for predicting the prosody parameter of the text according to the result of text analysis step; Speech synthesis means for synthesizing speech of said text based on said predicted prosody parameter of the text; and prosody structure adjusting means for adjusting the prosody structure of the text according to a target speech speed for the synthesized speech. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 35)
-
-
29. A method for adjusting a text to speech corpus, said corpus is a first corpus, said method comprising:
-
building a decision tree for prosody structure prediction based on the first corpus; setting a target speech speed for the corpus; building the relationship between the distribution for prosody phrase length and the speech speed for the first corpus based on said decision tree; and adjusting said distribution for prosody phrase length of the first corpus according to the target speech speed based on said decision tree and said relationship. - View Dependent Claims (30, 34)
-
-
31. An apparatus for adjusting a text to speech corpus, said corpus is a first corpus, said apparatus comprising:
-
means for building a decision tree for prosody structure prediction based on the first corpus; means for setting a target speech speed for the corpus; means for building the relationship between the distribution for prosody phrase length and the speech speed for the first corpus based on said decision tree; and means for adjusting said distribution of prosody phrase length of the first corpus according to the target speech speed based on said decision tree and said relationship. - View Dependent Claims (32, 36)
-
Specification