Method and system for statistic-based distance definition in text-to-speech conversion
First Claim
Patent Images
1. A method comprising the steps of:
- analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation;
performing clustering for samples in the obtained text; and
generating a Gaussian Mixture Model for each cluster to determine the distance between the sample and the corresponding Gaussian Mixture Model.
8 Assignments
0 Petitions
Accused Products
Abstract
A method for distance definition in a text-to-speech conversion system by applying Gaussian Mixture Model (GMM) to a distance definition. According to an embodiment, the text that is to be subjected to text-to-speech conversion is analyzed to obtain a text with descriptive prosody annotation; clustering is performed for samples in the obtained text; and a GMM model is generated for each cluster, to determine the distance between the sample and the corresponding GMM model.
187 Citations
22 Claims
-
1. A method comprising the steps of:
-
analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation;
performing clustering for samples in the obtained text; and
generating a Gaussian Mixture Model for each cluster to determine the distance between the sample and the corresponding Gaussian Mixture Model. - View Dependent Claims (2, 3)
-
-
4. A system comprising:
-
a text analysis unit for analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation;
a prosody prediction unit for performing clustering for samples in the text obtained by the text analysis unit; and
a Gaussian Mixture Model base, coupled to the prosody prediction unit, for storing a generated Gaussian Mixture Model. - View Dependent Claims (5, 6)
-
-
7. A method comprising the steps of:
-
determining a cluster for a unit to be subjected to text-to-speech conversion;
determining the Gaussian Mixture Model of the cluster;
calculating the distance between candidate samples in the cluster and the determined Gaussian Mixture Model; and
identifying the sample with the smallest distance for subsequent speech synthesizing. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a cluster determining unit for determining the cluster for the unit to be subjected to text-to-speech conversion to determine the Gaussian Mixture Model of the cluster;
a distance calculating unit, for calculating the distance between the candidate samples in the cluster and the determined Gaussian Mixture Model; and
an optimizing unit, for identifying the sample with the smallest distance for subsequent speech synthesizing. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
Specification