Speech synthesis with fuzzy heteronym prediction using decision trees
First Claim
1. A method for speech synthesis, comprising:
- determining data generated by text analysis as fuzzy heteronym data;
performing a fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof;
generating fuzzy context feature labels based on the plurality of candidate pronunciations of the fuzzy heteronym data and the probabilities thereof;
determining model parameters for the fuzzy context feature labels based on an acoustic model with a fuzzy decision tree;
generating speech parameters for the model parameters, using a device selected from the group consisting of a computer and a logic circuit; and
synthesizing the speech parameters as speech.
1 Assignment
0 Petitions
Accused Products
Abstract
According to one embodiment, a method, apparatus for synthesizing speech, and a method for training acoustic model used in speech synthesis is provided. The method for synthesizing speech may include determining data generated by text analysis as fuzzy heteronym data, performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof, generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof, determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree, generating speech parameters from the model parameters, and synthesizing the speech parameters via synthesizer as speech.
209 Citations
10 Claims
-
1. A method for speech synthesis, comprising:
-
determining data generated by text analysis as fuzzy heteronym data; performing a fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof; generating fuzzy context feature labels based on the plurality of candidate pronunciations of the fuzzy heteronym data and the probabilities thereof; determining model parameters for the fuzzy context feature labels based on an acoustic model with a fuzzy decision tree; generating speech parameters for the model parameters, using a device selected from the group consisting of a computer and a logic circuit; and synthesizing the speech parameters as speech. - View Dependent Claims (2)
-
-
3. An apparatus for synthesizing speech, comprising:
-
a heteronym prediction unit, implemented in a logic circuit, for predicting pronunciation of fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and predicting probabilities; a fuzzy context feature labels generating unit, implemented in a logic circuit, for generating fuzzy context feature labels based on the plurality of candidate pronunciations of the fuzzy heteronym data and the probabilities thereof; a determining unit, implemented in a logic circuit, for determining model parameters for the fuzzy context feature labels based on an acoustic model with a fuzzy decision tree; a parameter generator, implemented in a logic circuit, for generating speech parameters for the model parameters; and a synthesizer, implemented in a logic circuit, for synthesizing the speech parameters as speech. - View Dependent Claims (4)
-
-
5. A system for synthesizing speech, comprising:
-
a logic circuit for determining data generated by text analysis as fuzzy heteronym data; a logic circuit for performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof; a logic circuit for generating fuzzy context feature labels based on the plurality of candidate pronunciations of the fuzzy heteronym data and the probabilities thereof; a logic circuit for determining model parameters for the fuzzy context feature labels based on an acoustic model with a fuzzy decision tree; a logic circuit for generating speech parameters for the model parameters; and a logic circuit for synthesizing the speech parameters as speech.
-
-
6. A method for training acoustic model, comprising:
-
a training respective speech unit in a speech database to generate an acoustic model, the speech unit includes acoustic parameters and context labels; for context combination, performing a decision tree clustering process to generate the acoustic model with a decision tree; determining fuzzy data in the speech database based on the acoustic model with the decision tree; generating the fuzzy context feature labels for the fuzzy data; and cluster training the speech database based on the fuzzy context feature labels to generate the acoustic model with the fuzzy decision tree, using a device selected from the group consisting of a computer and a logic circuit. - View Dependent Claims (7, 8, 9, 10)
-
Specification