Method and system for statistic-based distance definition in text-to-speech conversion

US 7,590,540 B2
Filed: 09/29/2005
Issued: 09/15/2009
Est. Priority Date: 09/30/2004
Status: Active Grant

First Claim

Patent Images

1. A method comprising the steps of:

analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation;

performing clustering for samples in the obtained text through the use of a decision tree, wherein clustering comprises combining two branches of the decision tree for clustering samples if the two branches are similar for further clustering;

generating a Gaussian Mixture Model for each cluster to determine the distance between the sample and the corresponding Gaussian Mixture Model;

using electronic logic circuitry to identify a sample according to the distance; and

transforming the identified sample into synthesized speech.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for distance definition in a text-to-speech conversion system by applying Gaussian Mixture Model (GMM) to a distance definition. According to an embodiment, the text that is to be subjected to text-to-speech conversion is analyzed to obtain a text with descriptive prosody annotation; clustering is performed for samples in the obtained text; and a GMM model is generated for each cluster, to determine the distance between the sample and the corresponding GMM model.

Citations

18 Claims

1. A method comprising the steps of:
- analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation;
  
  performing clustering for samples in the obtained text through the use of a decision tree, wherein clustering comprises combining two branches of the decision tree for clustering samples if the two branches are similar for further clustering;
  
  generating a Gaussian Mixture Model for each cluster to determine the distance between the sample and the corresponding Gaussian Mixture Model;
  
  using electronic logic circuitry to identify a sample according to the distance; and
  
  transforming the identified sample into synthesized speech.

2. A system comprising:
- a text analysis unit for analyzing text that is to be subjected to text-to-speech conversion to obtain text with descriptive prosody annotation;
  
  a prosody prediction unit for performing clustering for samples in the text obtained by the text analysis unit through the use of a decision tree, wherein the prosody prediction unit comprises a combining unit for combining similar branches in the decision tree for further clustering;
  
  a Gaussian Mixture Model base, coupled to the prosody prediction unit, for storing a generated Gaussian Mixture Model; and
  
  a distance calculating unit using electronic logic circuitry for calculating the distance between candidate samples in a cluster and a Gaussian Mixture Model; and
  
  an optimizing unit, for identifying the candidate sample with the smallest distance for subsequent speech synthesizing.

3. A method comprising the steps of:
- determining a cluster for a unit to be subjected to text-to-speech conversion;
  
  determining the Gaussian Mixture Model for the cluster, wherein the Gaussian Mixture Model is generated for a sample clustered through the use of a decision tree which includes combining two branches in the decision tree for clustering samples if the two branches are similar for further clustering;
  
  calculating the distance between candidate samples in the cluster and the determined Gaussian Mixture Model;
  
  using electronic logic circuitry to identify the sample with the smallest distance for subsequent speech synthesizing; and
  
  transforming the identified sample into synthesized speech.
- View Dependent Claims (4, 5, 6, 7, 8, 9, 10)
- - 4. The method according to claim 3, wherein the step of identifying the sample with the smallest distance comprises identifying the sample with the smallest target cost plus transition cost.
  - 5. The method according to claim 3, wherein the step of identifying the sample with the smallest distance comprises identifying the sample with the smallest target cost.
  - 6. The method according to claim 3, wherein the calculating step comprises calculating the target cost and the transition cost.
  - 7. The method according to claim 6, wherein the step of identifying the sample with the smallest distance comprises identifying the sample with the smallest target cost.
  - 8. The method according to claim 6, wherein the step of identifying the sample with the smallest distance comprises identifying the sample with the smallest target cost plus transition cost.
  - 9. The method according to claim 3, wherein the step of determining the cluster for the unit to be subjected to text-to-speech conversion comprises:
    - obtaining descriptive prosody annotation information of each unit to be subjected to text-to-speech conversion;
      
      finding the context equivalent cluster of each unit to be subjected to text-to-speech conversion, the cluster coffesponding to a Gaussian Mixture Model; and
      
      in the space of the Gaussian Mixture Model mixture model sequence, searching for the best values based on the distance definition and criteria of overall optimization.
  - 10. The method according to claim 3, wherein the steps of calculating the distance between the candidate samples in the cluster and the determined Gaussian Mixture Model and identifying the sample with the smallest distance for subsequent speech synthesizing comprises:
    - obtaining descriptive prosody annotation information of each unit to be subjected to text-to-speech conversion;
      
      finding the context equivalent cluster of each unit to be subjected to text-to-speech conversion, the cluster coffesponding to a Gaussian Mixture Model;
      
      evaluating all the candidates of the unit to be text-to-speech conversion synthesized through the Gaussian Mixture Model-based distance definition; and
      
      finding the overall optimal candidate series, for subsequent speech synthesizing, based on the distance given in the evaluating step and criteria of overall optimization.

11. A system comprising:
- a cluster determining unit for determining the cluster for the unit to be subjected to text-to-speech conversion to determine the Gaussian Mixture Model of the cluster, wherein the Gaussian Mixture Model is generated from samples clustered through the use of a decision tree which includes combining two branches in the decision tree for clustering samples if the two branches are similar for further clustering;
  
  a distance calculating unit;
  
  using electronic logic circuitry for calculating the distance between the candidate samples in the cluster and the determined Gaussian Mixture Model; and
  
  an optimizing unit, for identifying the sample with the smallest distance for subsequent speech synthesizing.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The system according to claim 11, wherein the optimizing unit is configured to identify the sample with the smallest target cost plus transition cost.
  - 13. The system according to claim 11, wherein the optimizing unit is configured to identify the sample with the smallest target cost.
  - 14. The system according to claim 11, wherein the distance calculating unit further comprises a unit for calculating a target cost and a unit for calculating a transition cost.
  - 15. The system according to claim 14, wherein the optimizing unit is configured to identify the sample with the smallest target cost plus transition cost.
  - 16. The system according to claim 14, wherein the optimizing unit is configured to identify the sample with the smallest target cost.
  - 17. The system according to claim 11, wherein the cluster determining unit further comprises:
    - means for getting descriptive prosody annotation information of each unit to be subjected to text-to-speech conversion;
      
      means for finding the context equivalent cluster of each unit to be subjected to text-to-speech conversion, the cluster coffesponding to a Gaussian Mixture Model; and
      
      means for, in the space of the mixture model sequence, searching for the best values, to be used as the as the explicit prediction, based on the distance definition and criteria of overall optimization.
  - 18. The system according to claim 11, wherein the calculating unit further comprises:
    - means for obtaining descriptive prosody annotation information of each unit to be subjected to text-to-speech conversion;
      
      means for finding the context equivalent cluster of each unit to be subjected to text-to-speech conversion, which corresponds to a mixture model;
      
      means for evaluating all the candidates of the unit to be text-to-speech conversion synthesized through the Gaussian Mixture Model-based distance definition; and
      
      wherein the optimizing unit further comprises means for finding the overall optimal candidate series, for subsequent speech synthesizing, based on the distance from the means for evaluating and criteria of overall optimization.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Zhang, Wei Z W, Chai, Hai Xin, Ma, Xi Jun, Jin, Ling
Primary Examiner(s)
Vo; Huyen X.

Application Number

US11/239,500
Publication Number

US 20060074674A1
Time in Patent Office

1,447 Days
Field of Search

704/258, 704/260, 704/268, 704/267, 704/256.6, 704/266, 704/257, 704/243, 704/270, 704/200
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/10 Prosody rules derived from ...

Method and system for statistic-based distance definition in text-to-speech conversion

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for statistic-based distance definition in text-to-speech conversion

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links