Speech synthesis with fuzzy heteronym prediction using decision trees

US 9,058,811 B2
Filed: 02/22/2012
Issued: 06/16/2015
Est. Priority Date: 02/25/2011
Status: Expired due to Fees

First Claim

Patent Images

1. A method for speech synthesis, comprising:

determining data generated by text analysis as fuzzy heteronym data;

performing a fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof;

generating fuzzy context feature labels based on the plurality of candidate pronunciations of the fuzzy heteronym data and the probabilities thereof;

determining model parameters for the fuzzy context feature labels based on an acoustic model with a fuzzy decision tree;

generating speech parameters for the model parameters, using a device selected from the group consisting of a computer and a logic circuit; and

synthesizing the speech parameters as speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to one embodiment, a method, apparatus for synthesizing speech, and a method for training acoustic model used in speech synthesis is provided. The method for synthesizing speech may include determining data generated by text analysis as fuzzy heteronym data, performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof, generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof, determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree, generating speech parameters from the model parameters, and synthesizing the speech parameters via synthesizer as speech.

209 Citations

10 Claims

1. A method for speech synthesis, comprising:
- determining data generated by text analysis as fuzzy heteronym data;
  
  performing a fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof;
  
  generating fuzzy context feature labels based on the plurality of candidate pronunciations of the fuzzy heteronym data and the probabilities thereof;
  
  determining model parameters for the fuzzy context feature labels based on an acoustic model with a fuzzy decision tree;
  
  generating speech parameters for the model parameters, using a device selected from the group consisting of a computer and a logic circuit; and
  
  synthesizing the speech parameters as speech.
- View Dependent Claims (2)
- - 2. The method according to claim 1, wherein the step of generating fuzzy context feature labels further comprises:
    - determining a degree to which context labels of candidate pronunciations of the fuzzy heteronym data fall into category based on the probabilities; and
      
      transforming the degree by scaling to generate the fuzzy context feature labels, wherein the fuzzy context feature labels are joint representation of context labels of the candidate pronunciations.

3. An apparatus for synthesizing speech, comprising:
- a heteronym prediction unit, implemented in a logic circuit, for predicting pronunciation of fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and predicting probabilities;
  
  a fuzzy context feature labels generating unit, implemented in a logic circuit, for generating fuzzy context feature labels based on the plurality of candidate pronunciations of the fuzzy heteronym data and the probabilities thereof;
  
  a determining unit, implemented in a logic circuit, for determining model parameters for the fuzzy context feature labels based on an acoustic model with a fuzzy decision tree;
  
  a parameter generator, implemented in a logic circuit, for generating speech parameters for the model parameters; and
  
  a synthesizer, implemented in a logic circuit, for synthesizing the speech parameters as speech.
- View Dependent Claims (4)
- - 4. The apparatus according to claim 3, wherein the fuzzy context feature labels generating unit is further configured to:
    - determine a degree to which context labels of candidate pronunciations of the fuzzy heteronym data fall into category based on the probabilities; and
      
      transform the degree by scaling to generate the fuzzy context feature labels, wherein the fuzzy context feature labels are joint representation of context labels of the candidate pronunciations.

5. A system for synthesizing speech, comprising:
- a logic circuit for determining data generated by text analysis as fuzzy heteronym data;
  
  a logic circuit for performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof;
  
  a logic circuit for generating fuzzy context feature labels based on the plurality of candidate pronunciations of the fuzzy heteronym data and the probabilities thereof;
  
  a logic circuit for determining model parameters for the fuzzy context feature labels based on an acoustic model with a fuzzy decision tree;
  
  a logic circuit for generating speech parameters for the model parameters; and
  
  a logic circuit for synthesizing the speech parameters as speech.

6. A method for training acoustic model, comprising:
- a training respective speech unit in a speech database to generate an acoustic model, the speech unit includes acoustic parameters and context labels;
  
  for context combination, performing a decision tree clustering process to generate the acoustic model with a decision tree;
  
  determining fuzzy data in the speech database based on the acoustic model with the decision tree;
  
  generating the fuzzy context feature labels for the fuzzy data; and
  
  cluster training the speech database based on the fuzzy context feature labels to generate the acoustic model with the fuzzy decision tree, using a device selected from the group consisting of a computer and a logic circuit.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The method according to claim 6, wherein the step of determining the fuzzy data further comprises:
    - estimating the speech unit;
      
      determining a degree to which candidate context labels of the speech unit fall into a category; and
      
      determining the speech unit as the fuzzy data if the degree satisfies a predetermined threshold.
  - 8. The method according to claim 7, wherein the step of estimating the speech unit further comprises:
    - estimating scores of the context feature labels of candidate pronunciations of the speech unit by model posterior probability or distance between model generating parameters and speech unit parameters.
  - 9. The method according to claim 6, wherein the step of generating the fuzzy context feature labels further comprises:
    - determining scores of the context feature labels of candidate pronunciations of the speech unit by estimating the speech unit;
      
      determining a degree to which the candidate context labels of the speech unit fall into the category; and
      
      transforming the degree by scaling to generate the fuzzy context feature labels, wherein the fuzzy context feature labels are joint representation of context labels of the candidate pronunciations.
  - 10. The method according to claim 6, wherein the step of cluster training based on the fuzzy context feature labels further comprises one of:
    - training a training set including the fuzzy data based on the fuzzy context feature labels and a predefined fuzzy question set to generate the acoustic model with the fuzzy decision tree; and
      
      re-training the respective speech unit in the speech database based on a question set and context feature labels, wherein the question set further includes a predefined fuzzy question set, and the context feature labels of the fuzzy data in the speech database are the fuzzy context feature labels.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Wang, Xi, Lou, Xiaoyan, Li, Jian
Primary Examiner(s)
Lerner, Martin

Application Number

US13/402,602
Publication Number

US 20120221339A1
Time in Patent Office

1,210 Days
Field of Search

704/258, 704/259, 704/260, 704/266, 706/1, 706/8
US Class Current

1/1
CPC Class Codes

G10L 13/08 Text analysis or generation...

Speech synthesis with fuzzy heteronym prediction using decision trees

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

209 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis with fuzzy heteronym prediction using decision trees

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

209 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links