Generation and synthesis of prosody templates

US 6,185,533 B1
Filed: 03/15/1999
Issued: 02/06/2001
Est. Priority Date: 03/15/1999
Status: Expired due to Term

First Claim

Patent Images

1. A template generation system for generating a duration template from a plurality of input words, comprising:

a phonetic processor operable to segment each of said input words into input phonemes and group said input phonemes into constituent syllables, each of said constituent syllables having an associated syllable duration;

a phoneme clustering module to cluster said input phonemes comprising a constituent syllable into input phoneme pairs and input single phonemes;

a global static table containing a plurality of stored phonemes comprising stored phoneme pairs and stored single phonemes, each of said stored phonemes having associated static duration information;

a normalization module to generate a normalized duration value for each of said constituent syllables, wherein said normalized duration value is generated by dividing the syllable duration by the combined static duration of the corresponding stored phonemes that comprise said constituent syllable;

the duration template for storing the normalized duration value, said template being specified by text grouping feature, such that the normalized duration value for each constituent syllable having a specific grouping feature is contained in the associated duration template.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of separating high-level prosodic behavior from purely articulatory constraints so that timing information can be extracted from human speech is presented. The extracted timing information is used to construct duration templates that are employed for speech synthesis. The duration templates are constructed so that words exhibiting the same stress pattern will be assigned the same duration template. Initially, the words of input text segmented into phonemes and syllables, and the associated stress pattern is assigned. The stress assigned words are then assigned grouping features by a text grouping module. A phoneme cluster module groups the phonemes into phoneme pairs and single phonemes. A static duration associated with each phoneme pair and single phoneme is retrieved from a global static table. A normalization module generates a normalized syllable duration value based upon the retrieved static durations associated with the phonemes that comprise the syllable. The normalized syllable duration value is stored in a duration template based upon the grouping features associated with that syllable. To produce natural human-sounding prosody in synthesized speech, the duration information is then extracted from the selected template, de-normalized and applied to the phonemic information.

220 Citations

18 Claims

1. A template generation system for generating a duration template from a plurality of input words, comprising:
- a phonetic processor operable to segment each of said input words into input phonemes and group said input phonemes into constituent syllables, each of said constituent syllables having an associated syllable duration;
  
  a phoneme clustering module to cluster said input phonemes comprising a constituent syllable into input phoneme pairs and input single phonemes;
  
  a global static table containing a plurality of stored phonemes comprising stored phoneme pairs and stored single phonemes, each of said stored phonemes having associated static duration information;
  
  a normalization module to generate a normalized duration value for each of said constituent syllables, wherein said normalized duration value is generated by dividing the syllable duration by the combined static duration of the corresponding stored phonemes that comprise said constituent syllable;
  
  the duration template for storing the normalized duration value, said template being specified by text grouping feature, such that the normalized duration value for each constituent syllable having a specific grouping feature is contained in the associated duration template.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The template generation system of claim 1 further including a text grouping module operable to identify text grouping features associated with each of the constituent syllables.
  - 3. The template generation system of claim 2 wherein said text grouping features are selected from the group of:
    - word stress pattern, phonemic representation, syntactic boundary, sentence position, sentence type, phrase position, and grammatical category.
  - 4. The template generation system of claim 1 further including a text grouping module operable to assign a stress level to each of the constituent syllables, wherein the stress level defines the text grouping feature for the constituent syllable.
  - 5. The template generation system of claim 1 further comprising a word database for storing the input words with associated word and sentence grouping features.
  - 6. The template generation system of claim 5 wherein the associated word grouping features are selected from the group of;
    - phonemic representation, word syllable boundaries, syllable stress assignment, and the duration of each constituent syllable.
  - 7. The template generation system of claim 5 wherein the associated sentence grouping features are selected from the group of;
    - sentence position, sentence type, phrase position, syntactic boundary, and grammatical category.
  - 8. The template generation system of claim 1 wherein the associated static duration information is selected from the group of:
    - mean duration, standard deviation of the duration, maximum duration, minimum duration, and covariance.
  - 9. The template generation system of claim 1 wherein the phoneme clustering module further includes a targeted combination criteria to determine which input phonemes to group into an input phoneme pair, wherein each of the input phoneme pairs complies with the targeted combination criteria.
  - 10. The template generation system of claim 9 wherein the targeted combination criteria is selected from the group of:

11. A method of generating a duration template from a plurality of input words, the method comprising the steps of:
- segmenting each of said input words into input phonemes;
  
  grouping the input phonemes into constituent syllables having an associated syllable duration;
  
  clustering the input phonemes into input phoneme pairs and input single phonemes;
  
  retrieving static duration information associated with stored phonemes in a global static table, wherein the stored phonemes correspond to the input phonemes that constitute the constituent syllable;
  
  generating a normalized duration value by dividing the syllable duration by the combined static duration of the stored phonemes corresponding to the input phonemes that constitute the constituent syllable; and
  
  storing the normalized duration value in the duration template.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The method of claim 11 further comprising the steps of:
13. The method of claim 11 further comprising the steps of:
- assigning grouping features to the constituent syllables; and
  
  storing the input words and constituent syllables with associated grouping features in a word database.
14. The method of claim 11 wherein the step of clustering the input phonemes into input phoneme pairs and input single phonemes further comprises the steps of;
- searching the constituent syllable from left to right;
  
  selecting the input phonemes in the constituent syllable that equate to a targeted combination; and
  
  clustering the selected input phonemes into an input phoneme pair.
15. The method of claim 14 further including the steps of:
- searching the constituent syllable from right to left;
  
  selecting the input phonemes in the constituent syllable that equate to the targeted combination; and
  
  clustering the selected input phonemes into an input phoneme pair.

16. A method of de-normalizing duration data contained in a duration template, the method comprising the steps of:
- providing a target word to be synthesized by a text-to-speech system;
  
  segmenting each of said input words into input phonemes;
  
  grouping the input phonemes into constituent syllables having an associated syllable duration clustering the input phonemes into input phoneme pairs and input single phonemes;
  
  retrieving static duration information associated with stored phonemes in a global static table, wherein the stored phonemes correspond to the input phonemes that constitute each of the constituent syllables;
  
  retrieving a normalized duration value for each of the constituent syllables from an associated duration template; and
  
  generating a de-normalized syllable duration by multiplying the normalized duration value for each constituent syllable by the combined static duration of the stored phonemes corresponding to the input phonemes that constitute that constituent syllable.
- View Dependent Claims (17, 18)
- - 17. The method of claim 16 further comprising the step of:
18. The method of claim 16 further comprising the step of:
- retrieving grouping features associated with the target word from a word dictionary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Holm, Frode, Hata, Kazue
Primary Examiner(s)
Ŝmits, Ta̅livaldis I.
Assistant Examiner(s)
Nolan, Daniel A.

Application Number

US09/268,229
Time in Patent Office

694 Days
Field of Search

704/200-260, 704/267, 704/264, 434/157
US Class Current

704/267
CPC Class Codes

G10L 13/08 Text analysis or generation...

G10L 13/10 Prosody rules derived from ...

Generation and synthesis of prosody templates

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

220 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

Generation and synthesis of prosody templates

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

220 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others