SPEECH SYNTHESIS BY CONCATENATION OF FORMANT ENCODED WORDS
First Claim
1. A system for composing speech messages from sequences of prerecorded words, which comprises:
- means for analyzing each word of a vocabulary of spoken words to produce a separate parametric description of each;
means for storing said parametric descriptions;
means under control of an applied command signal for sequentially withdrawing from storage those descriptions required to assemble a desired spoken message;
means for individually altering the duration of the description of each word of said message in accordance with prescribed timing rules;
means for merging consecutive word descriptions together on the basis of the respective, voice-unvoiced character of the merged word descriptions;
means for altering the pitch characteristic of said continuous message description in accordancE with a prescribed contour; and
means for utilizing said continuous description to control a speech synthesizer.
0 Assignments
0 Petitions
Accused Products
Abstract
Audio response units that select speech sounds, stored in analog or coded digital form, as the excitation for a speech synthesizer are widely used, for example in telephone audio announcement terminals. The speech produced by most units is noticeably artifical and mechanical sounding. According to this invention, human speech is analyzed in terms of formant structure and coded for storage in the unit. As the individual words are called for, a stored program assembles them into a complete utterance, taking into account the durations of the words in the context of the complete utterance, pitch variations common to the language, and transitions between voiced portions of the speech. The result is a more natural sounding synthetic utterance.
-
Citations
13 Claims
-
1. A system for composing speech messages from sequences of prerecorded words, which comprises:
- means for analyzing each word of a vocabulary of spoken words to produce a separate parametric description of each;
means for storing said parametric descriptions;
means under control of an applied command signal for sequentially withdrawing from storage those descriptions required to assemble a desired spoken message;
means for individually altering the duration of the description of each word of said message in accordance with prescribed timing rules;
means for merging consecutive word descriptions together on the basis of the respective, voice-unvoiced character of the merged word descriptions;
means for altering the pitch characteristic of said continuous message description in accordancE with a prescribed contour; and
means for utilizing said continuous description to control a speech synthesizer.
- means for analyzing each word of a vocabulary of spoken words to produce a separate parametric description of each;
-
2. A system for composing speech messages as defined in claim 1, wherein, said parametric description of each word in said vocabulary comprises:
- a representation of the formants, voiced and unvoiced amplitudes, and fricative pole-zero characteristics of said spoken word.
-
3. A system for composing speech messages as defined in claim 2, wherein, said representations are in a coded digital formant.
-
4. Apparatus for processing parametric descriptions of selected prerecorded spoken words to form a continuous description of a prescribed message suitable for actuating a speech synthesizer, which comprises:
- means for deriving a spectral derivative function for each word description of said message;
means for individually altering the durations of selected word descriptions in accordance with stored timing information;
means operative in response to said spectral derivative functions for developing parametric descriptions of transitions between voiced word regions scheduled to be merged to form said message;
means for concatenating said altered word descriptions with said transition descriptions in accordance with said prescribed message to form a continuous parametric message description; and
means for altering the pitch characteristic of said message description in accordance with prescribed rules.
- means for deriving a spectral derivative function for each word description of said message;
-
5. Apparatus for processing parametric descriptions as defined in claim 4, wherein:
- said stored timing information comprises a schedule of word durations as a function of position in an input string of words, and of the number of phonemes per word.
-
6. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises:
- a schedule of word durations derived from rules based on common language usage.
-
7. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises:
- a schedule of word durations assembled from measurements of a naturally spoken version of said prescribed message.
-
8. Apparatus for processing parametric descriptions of selected prerecorded words, as defined in claim 4, wherein, said parametric descriptions of transitions are developed for the last 100 msec of the first of two words to be merged and the first 100 msec of the second of said two words to be merged.
-
9. Apparatus as defined in claim 8, wherein, the rate of transition between said two words is proportional to the average of said spectral derivatives for said two words.
-
10. Apparatus for processing parametric descriptions of selected words as defined in claim 4, wherein said means for altering the pitch characteristic of said message description comprises:
- a stored, time-normalized pitch contour for a selected number of different messages; and
means for modifying said contour in accordance with said altered word description durations.
- a stored, time-normalized pitch contour for a selected number of different messages; and
-
11. Apparatus for developing control signals for a speech synthesizer, which comprises:
- means supplied with word length segmental and prosodic functions of each individual word of a desired message for deriving the spectral derivatives of each of said functions;
means responsive to said spectral derivatives for interpolating said segmental functions to establish contours which define smooth transitions between the words of said message;
means for concatenating said segmental functions in accordance with said transition contours, and, means for utilizing said prosodic functions to alter said concatenated segmental functions to develop control waveform signals which approximate the waveform of said desired message.
- means supplied with word length segmental and prosodic functions of each individual word of a desired message for deriving the spectral derivatives of each of said functions;
-
12. Apparatus as defined in claim 11, wherein, said segmental functions include the format frequencies, unvoiced pole and zero frequencies and amplitudes of each of said words.
-
13. Apparatus as dEfined in claim 11, wherein, said prosodic functions include timing and pitch variations for said words as a function of message syntax.
Specification