Processing device for speech synthesis by addition overlapping of wave forms

US 5,327,498 A
Filed: 11/15/1990
Issued: 07/05/1994
Est. Priority Date: 09/02/1988
Status: Expired due to Term

First Claim

Patent Images

1. Process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprising:

supplying a sequence of phoneme codes and respective prosodic information including the original fundamental period at the beginning and at the end of the phoneme and the duration thereof, and, for each phoneme, analysing and synthesizing each phoneme; and

then concatenating the synthesized phonemes;

wherein said analysis comprises, for each phoneme, selecting two diphones among the stored diphones and determining the presence of voicing,characterized in thatsaid analysis further includes, for voiced phonemes, subjecting the respective waveforms of the two diphones constituting the phoneme to filtering by a window having a predetermined position with respect to the waveform so selected that the window be centered on a point of the waveform representative of the beginning of a pulse response of vocal cords to excitation thereof, said window having a width substantially equal to twice the lesser of said original fundamental period and the fundamental synthesis period and having an amplitude progressively decreasing from the center of the window to zero at the edges thereof, anddisplacing the signals resulting from said filtering and obtained for each diphone with such a time shift that they are spaced apart by a time equal to the fundamental synthesis period,and characterized in that synthesis is achieved by adding the displaced overlapping signals.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprises supplying a sequence of phoneme codes and respective prosodic information, and, for each phoneme, analyzing and synthesizing each phoneme, and then concatenating the synthesized phonemes. For each phoneme, two diphones are selected among the stored diphones and the presence of voicing is determined. For voiced phonemes, the respective waveforms of the two diphones constituting the phoneme are filtered by a window which is centered on a point of the selected waveform representative of the beginning of a pulse response of vocal cords to excitation thereof. The window has a width substantially equal to twice the greater of the original fundamental period and the fundamental synthesis period and has an amplitude progressively decreasing from the center of the window. The signals resulting from the filtering and obtained for each diphone are time shifted so as to be spaced apart by a time equal to the fundamental synthesis period. Synthesis is achieved by adding the displaced overlapping signals.

262 Citations

8 Claims

1. Process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprising:
- supplying a sequence of phoneme codes and respective prosodic information including the original fundamental period at the beginning and at the end of the phoneme and the duration thereof, and, for each phoneme, analysing and synthesizing each phoneme; and
  
  then concatenating the synthesized phonemes;
  
  wherein said analysis comprises, for each phoneme, selecting two diphones among the stored diphones and determining the presence of voicing,characterized in thatsaid analysis further includes, for voiced phonemes, subjecting the respective waveforms of the two diphones constituting the phoneme to filtering by a window having a predetermined position with respect to the waveform so selected that the window be centered on a point of the waveform representative of the beginning of a pulse response of vocal cords to excitation thereof, said window having a width substantially equal to twice the lesser of said original fundamental period and the fundamental synthesis period and having an amplitude progressively decreasing from the center of the window to zero at the edges thereof, anddisplacing the signals resulting from said filtering and obtained for each diphone with such a time shift that they are spaced apart by a time equal to the fundamental synthesis period,and characterized in that synthesis is achieved by adding the displaced overlapping signals.
- View Dependent Claims (4, 5)
- - 4. Speech synthesis process according to claim 1, characterized in that the window is a Hanning window.
  - 5. Speech synthesis process according to claim 1, wherein the width of said window does not exceed three times the synthesized period.

2. Process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprising:
- supplying a sequence of phoneme codes and respective prosodic information, including the original fundamental period at the beginning and at the end of the phoneme and the duration thereof;
  
  for each phoneme, analysing said phoneme and synthesizing said phoneme with fundamental synthesis periods as indicated by said prosodic information; and
  
  then concatenating the synthesized phonemes;
  
  wherein said analysis comprises, for each phoneme, using a diphone descriptor for selecting two diphones among the stored diphones and determining the presence of voicing, characterized in thatsaid analysis further includes, for voices phonemes, subjecting the respective waveforms of the two diphones constituting the respective phoneme to filtering by a window having a predetermined position with respect to the waveform so selected that the window be centered on a point of the waveform representative of the beginning of the pulse response of vocal cords to excitation, said window having a width substantially equal to twice the lesser of said original fundamental period and the fundamental synthesis period and having an amplitude progressively decreasing from the center of the window to zero at the edges thereof, andredistributing the mutually overlapping signals resulting from said filtering and obtained for each diphone with such a time spacing that they are spaced by a time equal to the fundamental synthesis period,and characterized in that synthesis is achieved by adding the displaced overlapping signals.
- View Dependent Claims (3, 6, 7)
- - 3. Process according to claim 2, comprising the further preliminary step of fractionating the text to be synthesized into digital microframes each identified by the serial number of a corresponding phoneme in a dictionary diphone storing said waveforms.
  - 6. Speech synthesis process according to claim 2, wherein the descriptor is arranged for determining the address of each diphone for a first and a second phoneme as number of the diphone descriptor=number of the first phoneme+(number of the second phoneme -1)*number of diphones.
  - 7. Speech synthesis process according to claim 2, characterized in that transition between successive diphones is achieved by computing the average of two elementary wave signals extracted from each side of the diphone.

8. A digital speech synthesis device for text-to-speech conversion, comprising, connected to data and address buses:
- main RAM memory means containing;
  
  a diphone dictionary containing waveforms each stored as a plurality of samples, and each representing one of a plurality of diphones,a dictionary descriptor table including for each diphone and at a respective address, data identifying the beginning of the diphone, the length of the diphone, the middle of the diphone and voicing marks, said waveforms being stored in said dictionary in the order of the respective addresses in the dictionary descriptor table,a filtering Hanning window in sampled form,a computation micro-program, anda table space reserved for receiving successive microframes each representative of a phoneme and each including serial numbers of a diphone in said dictionary and prosodic information relating to said phoneme comprising at least the fundamental periods at the beginning and at the end of the phoneme to be synthesized;
  
  a local computing unit operating responsive to said micro-program and arranged for reading out, from said descriptor table, the identifying data of the two respective voiced diphones of each phoneme identified in turn by one of said microframes, for subjecting the respective waveforms to filtering by said Hanning window sampled for giving it a width substantially equal to twice the synthesized period as given by the respective micro-frame, for redistributing signals resulting from filtering of the respective waveforms with a period equal to the fundamental synthesis period and for adding the redistributed signals;
  
  a buffer memory;
  
  a routing circuit for alternatively connecting an input of said buffer memory to an output of the computing unit and an output of said buffer memory to an output digital/analog converter through a controller; and
  
  a speech amplifier driven by said digital/analog converter.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
French State Represented By The Minister of The Post Telecommunications and Space
Original Assignee
Ministry of Posts Tele-French State Communications & Space
Inventors
Hamon, Christian
Primary Examiner(s)
Kemeny, Emanuel S.

Application Number

US07/487,942
Time in Patent Office

1,328 Days
Field of Search

381/51-53, 395/2
US Class Current

704/268
CPC Class Codes

G10L 13/07 Concatenation rules

Processing device for speech synthesis by addition overlapping of wave forms

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

262 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Processing device for speech synthesis by addition overlapping of wave forms

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

262 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links