METHOD, APPARATUS FOR SYNTHESIZING SPEECH AND ACOUSTIC MODEL TRAINING METHOD FOR SPEECH SYNTHESIS

US 20120221339A1
Filed: 02/22/2012
Published: 08/30/2012
Est. Priority Date: 02/25/2011
Status: Active Grant

First Claim

Patent Images

1. A method for speech synthesis, comprising:

determining data generated by text analysis as fuzzy heteronym data;

performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof;

generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof;

determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree;

generating speech parameters for the model parameters; and

synthesizing the speech parameters as speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to one embodiment, a method, apparatus for synthesizing speech, and a method for training acoustic model used in speech synthesis is provided. The method for synthesizing speech may include determining data generated by text analysis as fuzzy heteronym data, performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof, generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof, determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree, generating speech parameters from the model parameters, and synthesizing the speech parameters via synthesizer as speech.

Citations

10 Claims

1. A method for speech synthesis, comprising:
- determining data generated by text analysis as fuzzy heteronym data;
  
  performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof;
  
  generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof;
  
  determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree;
  
  generating speech parameters for the model parameters; and
  
  synthesizing the speech parameters as speech.
- View Dependent Claims (2)
- - 2. The method according to claim 1, wherein the step of generating fuzzy context feature labels further comprises:
    - determining the degree to which context labels of candidate pronunciations of the fuzzy heteronym data fall into category based on the probabilities; and
      
      transforming the degree by scaling to generate the fuzzy context feature labels, wherein the fuzzy context feature labels are joint representation of context labels of the candidate pronunciations.

3. An apparatus for synthesizing speech, comprising:
- heteronym prediction unit for predicting pronunciation of fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and predicting probabilities;
  
  fuzzy context feature labels generating unit for generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof;
  
  determining unit for determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree;
  
  parameter generator for generating speech parameters for the model parameters; and
  
  synthesizer for synthesizing the speech parameters as speech.
- View Dependent Claims (4)
- - 4. The apparatus according to claim 3, wherein the fuzzy context feature labels generating unit is further configured to:
    - determine the degree to which context labels of candidate pronunciations of the fuzzy heteronym data fall into category based on the probabilities; and
      
      transform the degree by scaling to generate the fuzzy context feature labels, wherein the fuzzy context feature labels are joint representation of context labels of the candidate pronunciations.

5. A system for synthesizing speech, comprising:
- means for determining data generated by text analysis as fuzzy heteronym data;
  
  means for performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof;
  
  means for generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof;
  
  means for determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree;
  
  means for generating speech parameters for the model parameters; and
  
  means for synthesizing the speech parameters as speech.

6. A method for training acoustic model, comprising:
- training respective speech unit in speech database to generate acoustic model, the speech unit includes acoustic parameters and context labels;
  
  for context combination, performing decision tree clustering process to generate acoustic model with decision tree;
  
  determining fuzzy data in the speech database based on the acoustic model with decision tree;
  
  generating fuzzy context feature labels for the fuzzy data; and
  
  cluster training the speech database based on the fuzzy context feature labels to generate acoustic model with fuzzy decision tree.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The method according to claim 6, wherein the step of determining fuzzy data further comprises:
    - estimating speech unit; and
      
      determining the degree to which candidate context labels of the speech unit fall into category; and
      
      determining the speech unit as fuzzy data if the degree satisfies predetermined threshold.
  - 8. The method according to claim 7, wherein the step of estimating speech unit further comprises:
    - estimating scores of context feature labels of candidate pronunciations of the speech unit by model posterior probability or distance between model generating parameters and speech unit parameters.
  - 9. The method according to claim 6, wherein the step of generating fuzzy context feature labels further comprises:
    - determining scores of context feature labels of candidate pronunciations of the speech unit by estimating the speech unit;
      
      determining the degree to which candidate context labels of the speech unit fall into category; and
      
      transforming the degree by scaling to generate the fuzzy context feature labels, wherein the fuzzy context feature labels are joint representation of context labels of the candidate pronunciations.
  - 10. The method according to claim 6, wherein the step of cluster training based on the fuzzy context feature labels further comprises one of:
    - training train set including the fuzzy data based on the fuzzy context feature labels and predefined fuzzy question set to generate acoustic model with the fuzzy decision tree; and
      
      re-training respective speech unit in the speech database based on question set and context feature labels, wherein the question set further includes predefined fuzzy question set, and the context feature labels of the fuzzy data in the speech database are the fuzzy context feature labels.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Wang, Xi, Li, Jian, Lou, Xiaoyan

Granted Patent

US 9,058,811 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

METHOD, APPARATUS FOR SYNTHESIZING SPEECH AND ACOUSTIC MODEL TRAINING METHOD FOR SPEECH SYNTHESIS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD, APPARATUS FOR SYNTHESIZING SPEECH AND ACOUSTIC MODEL TRAINING METHOD FOR SPEECH SYNTHESIS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links