Speech synthesis employing prosody templates

US 6,260,016 B1
Filed: 11/25/1998
Issued: 07/10/2001
Est. Priority Date: 11/25/1998
Status: Expired due to Term

First Claim

Patent Images

1. An apparatus for generating synthesized speech from a text of input words, comprising:

a word dictionary containing information about a plurality of stored words, wherein said information identifies a stress pattern associated with each of said stored words;

a text processor that generates phonemic representations of said input words using said word dictionary to identify the stress pattern of said input words;

a prosody module having a database of standarized templates containing prosody information accessed via a stress pattern and a number of syllables, wherein said prosody information is normalized and parameterized;

a sound generation module that denormalizes and converts said standardized templates for applying to said phonemic representation; and

denormalizing said template via a sound generation module, said denormalizing shifts said template to a height that fits said frame sentence pitch contour.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Prosody templates, constructed during system design, store intonation (F0) and duration information based on syllabic stress patterns for the target word. The prosody templates are constructed so that words exhibiting the same stress pattern will be assigned the same prosody template. The prosody template information is preferably stored in a normalized form to reduce noise level in the statistical measures. The synthesizer uses a word dictionary that specifies the stress patterns associated with each stored word. These stress patterns are used to access the prosody template database. F0 and duration information is then extracted from the selected template, de-normalized and applied to the phonemic information to produce a natural human-sounding prosody in the synthesized output.

Citations

12 Claims

1. An apparatus for generating synthesized speech from a text of input words, comprising:
- a word dictionary containing information about a plurality of stored words, wherein said information identifies a stress pattern associated with each of said stored words;
  
  a text processor that generates phonemic representations of said input words using said word dictionary to identify the stress pattern of said input words;
  
  a prosody module having a database of standarized templates containing prosody information accessed via a stress pattern and a number of syllables, wherein said prosody information is normalized and parameterized;
  
  a sound generation module that denormalizes and converts said standardized templates for applying to said phonemic representation; and
  
  denormalizing said template via a sound generation module, said denormalizing shifts said template to a height that fits said frame sentence pitch contour.

2. A method for training a prosody template using human speech, comprising:
- segmenting words of a sentence into phonemes associated with syllables of said words;
  
  assigning stress levels to said syllables;
  
  grouping said words according to said stress levels thereby forming stress pattern groups;
  
  adjusting intonation data associated with each one of said stress pattern groups thereby providing normalized data;
  
  adjusting a pitch shift of said normalized data thereby providing transformed data; and
  
  storing said transformed data in a prosody database as a template.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 3. The method of claim 2 wherein said normalized data is based on resampling said intonation data for a plurality of intonation points.
  - 4. The method of claim 2 wherein said pitch shift constant is accomplished for said sentence via transformation of said intonation points into a log domain.
  - 5. The method of claim 2 wherein said prosody template is populated with averaged transformed data of said stress pattern group.
  - 6. The method of claim 2 further comprises the step of:
7. The method of claim 4 wherein said elevation point is adjusted as a common reference point.
8. The method of claim 7 producing a constant representing said denormalizing based on the regression-line coefficient of said frame sentence pitch contour.
9. The method of claim 7 further comprises the step of:
- accessing a duration template operably permitting denormalization of said duration information thereby associating a time with each of said syllables.
10. The method of claim 8 further comprises the step of:
- transforming log-domain values of said duration template into linear values.
11. The method of claim 9 further comprises the step of:
- resampling each of said syllable segments of the template for a fixed duration such that the total duration of (each) corresponds to the denormalized time values, whereby the intonation contour is associated with a physical timeline.
12. The method of claim 10 further comprises the steps of:
- storing duration information as ratios of phoneme values to globally determined duration values, said globally determined duration values are based on mean values across the entire training corpus;
  
  per-syllable values based on a sum of the observed phoneme; and
  
  said prosody template populated with said per-syllable versus global ratios operable permitting computation of an actual duration of said each syllable.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Holm, Frode, Hata, Kazue
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Nolan, Daniel A

Application Number

US09/200,027
Time in Patent Office

958 Days
Field of Search

704/200, 704/258, 704/260, 704/264, 704/268, 704/269
US Class Current

704/260
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

Speech synthesis employing prosody templates

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis employing prosody templates

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links