Processing device for speech synthesis by addition overlapping of wave forms
First Claim
1. Process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprising:
- supplying a sequence of phoneme codes and respective prosodic information including the original fundamental period at the beginning and at the end of the phoneme and the duration thereof, and, for each phoneme, analysing and synthesizing each phoneme; and
then concatenating the synthesized phonemes;
wherein said analysis comprises, for each phoneme, selecting two diphones among the stored diphones and determining the presence of voicing,characterized in thatsaid analysis further includes, for voiced phonemes, subjecting the respective waveforms of the two diphones constituting the phoneme to filtering by a window having a predetermined position with respect to the waveform so selected that the window be centered on a point of the waveform representative of the beginning of a pulse response of vocal cords to excitation thereof, said window having a width substantially equal to twice the lesser of said original fundamental period and the fundamental synthesis period and having an amplitude progressively decreasing from the center of the window to zero at the edges thereof, anddisplacing the signals resulting from said filtering and obtained for each diphone with such a time shift that they are spaced apart by a time equal to the fundamental synthesis period,and characterized in that synthesis is achieved by adding the displaced overlapping signals.
1 Assignment
0 Petitions
Accused Products
Abstract
A process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprises supplying a sequence of phoneme codes and respective prosodic information, and, for each phoneme, analyzing and synthesizing each phoneme, and then concatenating the synthesized phonemes. For each phoneme, two diphones are selected among the stored diphones and the presence of voicing is determined. For voiced phonemes, the respective waveforms of the two diphones constituting the phoneme are filtered by a window which is centered on a point of the selected waveform representative of the beginning of a pulse response of vocal cords to excitation thereof. The window has a width substantially equal to twice the greater of the original fundamental period and the fundamental synthesis period and has an amplitude progressively decreasing from the center of the window. The signals resulting from the filtering and obtained for each diphone are time shifted so as to be spaced apart by a time equal to the fundamental synthesis period. Synthesis is achieved by adding the displaced overlapping signals.
262 Citations
8 Claims
-
1. Process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprising:
-
supplying a sequence of phoneme codes and respective prosodic information including the original fundamental period at the beginning and at the end of the phoneme and the duration thereof, and, for each phoneme, analysing and synthesizing each phoneme; and
then concatenating the synthesized phonemes;wherein said analysis comprises, for each phoneme, selecting two diphones among the stored diphones and determining the presence of voicing, characterized in that said analysis further includes, for voiced phonemes, subjecting the respective waveforms of the two diphones constituting the phoneme to filtering by a window having a predetermined position with respect to the waveform so selected that the window be centered on a point of the waveform representative of the beginning of a pulse response of vocal cords to excitation thereof, said window having a width substantially equal to twice the lesser of said original fundamental period and the fundamental synthesis period and having an amplitude progressively decreasing from the center of the window to zero at the edges thereof, and displacing the signals resulting from said filtering and obtained for each diphone with such a time shift that they are spaced apart by a time equal to the fundamental synthesis period, and characterized in that synthesis is achieved by adding the displaced overlapping signals. - View Dependent Claims (4, 5)
-
-
2. Process of speech synthesis from diphones stored in a dictionary as waveforms, for text-to-speech conversion, comprising:
- supplying a sequence of phoneme codes and respective prosodic information, including the original fundamental period at the beginning and at the end of the phoneme and the duration thereof;
for each phoneme, analysing said phoneme and synthesizing said phoneme with fundamental synthesis periods as indicated by said prosodic information; and
then concatenating the synthesized phonemes;wherein said analysis comprises, for each phoneme, using a diphone descriptor for selecting two diphones among the stored diphones and determining the presence of voicing, characterized in that said analysis further includes, for voices phonemes, subjecting the respective waveforms of the two diphones constituting the respective phoneme to filtering by a window having a predetermined position with respect to the waveform so selected that the window be centered on a point of the waveform representative of the beginning of the pulse response of vocal cords to excitation, said window having a width substantially equal to twice the lesser of said original fundamental period and the fundamental synthesis period and having an amplitude progressively decreasing from the center of the window to zero at the edges thereof, and redistributing the mutually overlapping signals resulting from said filtering and obtained for each diphone with such a time spacing that they are spaced by a time equal to the fundamental synthesis period, and characterized in that synthesis is achieved by adding the displaced overlapping signals. - View Dependent Claims (3, 6, 7)
- supplying a sequence of phoneme codes and respective prosodic information, including the original fundamental period at the beginning and at the end of the phoneme and the duration thereof;
-
8. A digital speech synthesis device for text-to-speech conversion, comprising, connected to data and address buses:
-
main RAM memory means containing; a diphone dictionary containing waveforms each stored as a plurality of samples, and each representing one of a plurality of diphones, a dictionary descriptor table including for each diphone and at a respective address, data identifying the beginning of the diphone, the length of the diphone, the middle of the diphone and voicing marks, said waveforms being stored in said dictionary in the order of the respective addresses in the dictionary descriptor table, a filtering Hanning window in sampled form, a computation micro-program, and a table space reserved for receiving successive microframes each representative of a phoneme and each including serial numbers of a diphone in said dictionary and prosodic information relating to said phoneme comprising at least the fundamental periods at the beginning and at the end of the phoneme to be synthesized;
a local computing unit operating responsive to said micro-program and arranged for reading out, from said descriptor table, the identifying data of the two respective voiced diphones of each phoneme identified in turn by one of said microframes, for subjecting the respective waveforms to filtering by said Hanning window sampled for giving it a width substantially equal to twice the synthesized period as given by the respective micro-frame, for redistributing signals resulting from filtering of the respective waveforms with a period equal to the fundamental synthesis period and for adding the redistributed signals;a buffer memory; a routing circuit for alternatively connecting an input of said buffer memory to an output of the computing unit and an output of said buffer memory to an output digital/analog converter through a controller; and a speech amplifier driven by said digital/analog converter.
-
Specification