Method and system for statistic-based distance definition in text-to-speech conversion
First Claim
Patent Images
1. A method comprising the steps of:
- analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation;
performing clustering for samples in the obtained text through the use of a decision tree, wherein clustering comprises combining two branches of the decision tree for clustering samples if the two branches are similar for further clustering;
generating a Gaussian Mixture Model for each cluster to determine the distance between the sample and the corresponding Gaussian Mixture Model;
using electronic logic circuitry to identify a sample according to the distance; and
transforming the identified sample into synthesized speech.
8 Assignments
0 Petitions
Accused Products
Abstract
A method for distance definition in a text-to-speech conversion system by applying Gaussian Mixture Model (GMM) to a distance definition. According to an embodiment, the text that is to be subjected to text-to-speech conversion is analyzed to obtain a text with descriptive prosody annotation; clustering is performed for samples in the obtained text; and a GMM model is generated for each cluster, to determine the distance between the sample and the corresponding GMM model.
-
Citations
18 Claims
-
1. A method comprising the steps of:
-
analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation; performing clustering for samples in the obtained text through the use of a decision tree, wherein clustering comprises combining two branches of the decision tree for clustering samples if the two branches are similar for further clustering; generating a Gaussian Mixture Model for each cluster to determine the distance between the sample and the corresponding Gaussian Mixture Model; using electronic logic circuitry to identify a sample according to the distance; and transforming the identified sample into synthesized speech.
-
-
2. A system comprising:
-
a text analysis unit for analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation; a prosody prediction unit for performing clustering for samples in the text obtained by the text analysis unit through the use of a decision tree, wherein the prosody prediction unit comprises a combining unit for combining similar branches in the decision tree for further clustering; a Gaussian Mixture Model base, coupled to the prosody prediction unit, for storing a generated Gaussian Mixture Model; and a distance calculating unit using electronic logic circuitry for calculating the distance between candidate samples in a cluster and a Gaussian Mixture Model; and an optimizing unit, for identifying the candidate sample with the smallest distance for subsequent speech synthesizing.
-
-
3. A method comprising the steps of:
-
determining a cluster for a unit to be subjected to text-to-speech conversion; determining the Gaussian Mixture Model for the cluster, wherein the Gaussian Mixture Model is generated for a sample clustered through the use of a decision tree which includes combining two branches in the decision tree for clustering samples if the two branches are similar for further clustering; calculating the distance between candidate samples in the cluster and the determined Gaussian Mixture Model; using electronic logic circuitry to identify the sample with the smallest distance for subsequent speech synthesizing; and transforming the identified sample into synthesized speech. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
a cluster determining unit for determining the cluster for the unit to be subjected to text-to-speech conversion to determine the Gaussian Mixture Model of the cluster, wherein the Gaussian Mixture Model is generated from samples clustered through the use of a decision tree which includes combining two branches in the decision tree for clustering samples if the two branches are similar for further clustering; a distance calculating unit;
using electronic logic circuitry for calculating the distance between the candidate samples in the cluster and the determined Gaussian Mixture Model; andan optimizing unit, for identifying the sample with the smallest distance for subsequent speech synthesizing. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
Specification