Clustered patterns for text-to-speech synthesis

US 6,529,874 B2
Filed: 09/08/1998
Issued: 03/04/2003
Est. Priority Date: 09/16/1997
Status: Expired due to Term

First Claim

Patent Images

1. An apparatus for generating clustered patterns for text-to-speech synthesis, comprising:

representative pattern memory configured to store a plurality of initial representative patterns, each initial representative pattern being a noise pattern, an attribute being differently affixed to each initial representative pattern, the attribute including at least accent type;

pitch pattern memory configured to store a large number of natural pitch patterns for learning, each natural pitch pattern being an accent phrase in a sentence and including the attribute of the accent phrase;

clustering unit configured to classify each natural pitch pattern to the initial representative pattern, the natural pitch patterns of same attribute being classified to one initial representative pattern of the same attribute;

transformation parameter generation unit configured to respectively generate a transformation parameter for each natural pitch pattern by evaluating an error between a transformed representative pattern and each natural pitch pattern classified to the initial representative pattern from which the transformed representative pattern is generated;

representative pattern generation unit configured to update each initial representative pattern by calculating an evaluation function of the sum of the error between the transformed representative pattern and each natural pitch pattern classified to the initial representative pattern; and

wherein said representative pattern memory stores each updated representative pattern as a clustered pattern of the attribute affixed to the initial representative pattern from which the updated representative pattern is generated.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A representative pattern memory stores a plurality of initial representative patterns as a noise pattern. Different attribute is affixed to each initial representative pattern. A pitch pattern memory stores a large number of natural pitch patterns as an accent phrase. A clustering unit classifies each natural pitch pattern to the initial representative pattern based on the attribute of the accent phrase. A transformation parameter generation unit calculates an error between a transformed representative pattern and each natural pitch pattern classified to the initial representative pattern. A representative pattern generation unit calculates an evaluation function of the sum of the error between the transformed-representative pattern and each natural pitch pattern classified to the initial representative pattern, and updates each initial representative pattern. The representative pattern memory stores each updated representative pattern as a clustered pattern of the attribute affixed to the corresponding initial representative pattern.

35 Citations

View as Search Results

26 Claims

1. An apparatus for generating clustered patterns for text-to-speech synthesis, comprising:
- representative pattern memory configured to store a plurality of initial representative patterns, each initial representative pattern being a noise pattern, an attribute being differently affixed to each initial representative pattern, the attribute including at least accent type;
  
  pitch pattern memory configured to store a large number of natural pitch patterns for learning, each natural pitch pattern being an accent phrase in a sentence and including the attribute of the accent phrase;
  
  clustering unit configured to classify each natural pitch pattern to the initial representative pattern, the natural pitch patterns of same attribute being classified to one initial representative pattern of the same attribute;
  
  transformation parameter generation unit configured to respectively generate a transformation parameter for each natural pitch pattern by evaluating an error between a transformed representative pattern and each natural pitch pattern classified to the initial representative pattern from which the transformed representative pattern is generated;
  
  representative pattern generation unit configured to update each initial representative pattern by calculating an evaluation function of the sum of the error between the transformed representative pattern and each natural pitch pattern classified to the initial representative pattern; and
  
  wherein said representative pattern memory stores each updated representative pattern as a clustered pattern of the attribute affixed to the initial representative pattern from which the updated representative pattern is generated.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The apparatus according to claim 1,
- 3. The apparatus according to claim 2,wherein the transformation parameter represents one of a change of duration along a time axis, and a shift of frequency along a frequency axis.
- 4. The apparatus according to claim 1,wherein the attribute of the accent phrase includes accent type, number of mora, part of speech, and phoneme.
- 5. The apparatus according to claim 1,wherein said representative pattern memory stores a plurality of clustered patterns each corresponding to a different attribute affixed to each initial representative pattern.
- 6. The apparatus according to claim 1,wherein said transformation parameter generation unit repeats generation of the transformation parameter, and said representative pattern generation unit repeats update of the representative pattern, until the evaluation function satisfies a predetermined condition.
- 7. The apparatus according to claim 6,wherein said representative pattern memory stores the updated representative pattern, when the evaluation function satisfies the predetermined condition.
- 8. The apparatus according to claim 7, further comprising:
  - a transformation parameter generation rule memory being configured to store the transformation parameter and the attribute of the natural pitch pattern of which the error is evaluated, when the evaluation function satisfies the predetermined condition.
- 9. The apparatus according to claim 6,wherein said transformation parameter generation unit generates the transformation parameters for all combinations of each natural pitch pattern and each initial representative pattern.
- 10. The apparatus according to claim 9, further comprising:
  - an error evaluation unit being configured to respectively calculate an error between each natural pitch pattern and each transformed representative pattern; and
    
    wherein said clustering unit classifies each natural pitch pattern to one initial representative pattern of which the error between the natural pitch pattern and the one initial representative pattern is the smallest among errors between the natural pitch pattern and all transformed representative patterns.
- 11. The apparatus according to claim 10, whenever said transformation parameter generation unit generates the transformation parameters for all combinations of each natural pitch pattern and each updated representative pattern, until the evaluation function satisfies the predetermined condition,wherein said error evaluation unit repeats calculation of the error, and said clustering unit repeats classification of each natural pitch pattern.
- 12. The apparatus according to claim 11, further comprising:
  - a representative pattern selection rule memory being configured to correspondingly store the attribute of the natural pitch patterns classified to each updated representative pattern and an address of the updated representative pattern in said representative pattern memory, when the evaluation function satisfies the predetermined condition.

13. A method for generating clustered patterns for text-to-speech synthesis, comprising the steps of:
- storing the plurality of initial representative patterns, each initial representative pattern being a noise pattern, an attribute being differently affixed to each initial representative pattern, the attribute including at least accent type;
  
  storing a large number of natural pitch patterns for learning, each natural pitch pattern being an accent phrase in a sentence and including the attribute of the accent phrase;
  
  classifying each natural pitch pattern to the initial representative pattern, the natural pitch patterns of same attribute being classified to one initial representative pattern of the same attribute;
  
  respectively generating a transformation parameter for each natural pitch pattern by evaluating an error between a transformed representative pattern and each natural pitch pattern classified to the initial representative pattern from which the transformed representative pattern is generated;
  
  updating each initial representative pattern by calculating an evaluation function of the sum of the error between the transformed representative pattern and each natural pitch pattern classified to the initial representative pattern; and
  
  storing each updated representative pattern as a clustered pattern of the attribute affixed to the initial representative pattern from which the updated representative pattern is generated.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The method according to claim 13,
- 15. The method according to claim 14, wherein the transformation parameter represents one of a change of duration along a time axis, and a shift of frequency along a frequency axis.
- 16. The method of according to claim 13, wherein the attribute of the accent phrase includes accent type, number of mora, part of speech, and phoneme.
- 17. The method according to claim 13, further comprising the step of:
  - storing a plurality of the clustered patterns each corresponding to a different attribute affixed to each initial representative pattern.
- 18. The method according to claim 13, further comprising the steps of:
  - repeating generation of the transformation parameter and update of the representative pattern, until the evaluation function satisfies a predetermined condition.
- 19. The method according to claim 18, further comprising the step of:
  - storing the updated representative pattern, when the evaluation function satisfies the predetermined condition.
- 20. The method according to claim 19, further comprising the step of:
  - storing the transformation parameter and the attribute of the natural pitch pattern of which the error is evaluated, when the evaluation function satisfies the predetermined condition.
- 21. The method according to claim 18, further comprising the step of:
  - generating the transformation parameters for all combinations of each natural pitch pattern and each initial representative pattern.
- 22. The method according to claim 21, further comprising the steps of:
  - respectively calculating an error between each natural pitch pattern and each transformed representative pattern; and
    
    classifying each natural pitch pattern to one initial representative pattern of which the error between the natural pitch pattern and the one initial representative pattern is the smallest among errors between the natural pitch pattern and all transformed representative patterns.
- 23. The method according to claim 22, further comprising the step of:
  - whenever the transformation parameters for all combinations of each natural pitch pattern and each updated representative pattern are generated, until the evaluation function satisfies the predetermined condition;
    
    repeating calculation of the error and classification of each natural pitch pattern.
- 24. The method according to claim 23, further comprising the step of:
  - correspondingly storing the attribute of the natural pitch patterns classified to each updated representative pattern and an address of the updated representative pattern, when the evaluation function satisfies the predetermined condition.

25. A computer readable memory containing computer readable instructions to generate clustered patterns for text-to-speech synthesis, comprising:
- instruction means for causing a computer to store a plurality of initial representative patterns, each initial representative pattern being a noise pattern, an attribute being differently affixed to each initial representative pattern, the attribute including at least accent type;
  
  instruction means for causing a computer to store a large number of natural pitch patterns for learning, each natural pitch pattern being an accent phrase in a sentence and including the attribute of the accent phrase;
  
  instruction means for causing a computer to classify each natural pitch pattern to the initial representative pattern, the natural pitch patterns of same attribute being classified to one initial representative pattern of the same attribute;
  
  instruction means for causing a computer to respectively generate a transformation parameter for each natural pitch pattern by evaluating an error between a transformed representative pattern and each natural pitch pattern classified to the initial representative pattern from which the transformed representative pattern is generated;
  
  instruction means for causing a computer to update each initial representative pattern by calculating an evaluation function of the sum of the error between the transformed representative pattern and each natural pitch pattern classified to the initial representative pattern; and
  
  instruction means for causing a computer to store each updated representative pattern as a clustered pattern of the attribute affixed to the initial representative pattern from which the updated representative pattern is generated.

26. A learning apparatus for generating a representative pattern as a typical pitch pattern used for text-to-speech synthesis, comprising:
- representative pattern memory means for storing a plurality of representative patterns and attribute data corresponding to each representative pattern, the representative pattern being variously transformed as a pitch pattern of a prosody unit by a transformation parameter, the attribute data being characteristic of the prosody unit to affect the pitch pattern;
  
  clustering means for classifying each of a plurality of prosody units in a text for learning to one of the plurality of representative patterns in said representative pattern memory means according to attribute data of each prosody unit;
  
  extraction means for extracting a natural pitch pattern corresponding to each prosody unit classified to the representative pattern from a plurality of natural pitch patterns corresponding to the text;
  
  transformation parameter generation means for generating the transformation parameter for evaluating an error between the natural pitch pattern and a transformed representative pattern for each prosody unit classified to the representative pattern; and
  
  representative pattern generation means for recursively generating the representative pattern by calculating an evaluation function of the sum of the error between the natural pitch pattern and the transformed representative pattern for all prosody units classified to the representative pattern.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Seto, Shigenobu, Morita, Masahiro, Akamine, Masami, Kagoshima, Takehiko, Shiga, Yoshinori, Nii, Takaaki
Primary Examiner(s)
Knepper, David D.

Application Number

US09/149,036
Publication Number

US 20010051872A1
Time in Patent Office

1,638 Days
Field of Search

704/258-269, 704/254, 704/207, 704/245, 704/220
US Class Current

704/269
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

Clustered patterns for text-to-speech synthesis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

35 Citations

26 Claims

Specification

Use Cases

Quick Links

Others

Clustered patterns for text-to-speech synthesis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

35 Citations

26 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others