SPEECH SYNTHESIS BY CONCATENATION OF FORMANT ENCODED WORDS

US 3,828,132 A
Filed: 10/30/1970
Issued: 08/06/1974
Est. Priority Date: 10/30/1970
Status: Expired due to Term

First Claim

Patent Images

1. A system for composing speech messages from sequences of prerecorded words, which comprises:

means for analyzing each word of a vocabulary of spoken words to produce a separate parametric description of each;

means for storing said parametric descriptions;

means under control of an applied command signal for sequentially withdrawing from storage those descriptions required to assemble a desired spoken message;

means for individually altering the duration of the description of each word of said message in accordance with prescribed timing rules;

means for merging consecutive word descriptions together on the basis of the respective, voice-unvoiced character of the merged word descriptions;

means for altering the pitch characteristic of said continuous message description in accordancE with a prescribed contour; and

means for utilizing said continuous description to control a speech synthesizer.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Audio response units that select speech sounds, stored in analog or coded digital form, as the excitation for a speech synthesizer are widely used, for example in telephone audio announcement terminals. The speech produced by most units is noticeably artifical and mechanical sounding. According to this invention, human speech is analyzed in terms of formant structure and coded for storage in the unit. As the individual words are called for, a stored program assembles them into a complete utterance, taking into account the durations of the words in the context of the complete utterance, pitch variations common to the language, and transitions between voiced portions of the speech. The result is a more natural sounding synthetic utterance.

Citations

13 Claims

1. A system for composing speech messages from sequences of prerecorded words, which comprises:
- means for analyzing each word of a vocabulary of spoken words to produce a separate parametric description of each;
  
  means for storing said parametric descriptions;
  
  means under control of an applied command signal for sequentially withdrawing from storage those descriptions required to assemble a desired spoken message;
  
  means for individually altering the duration of the description of each word of said message in accordance with prescribed timing rules;
  
  means for merging consecutive word descriptions together on the basis of the respective, voice-unvoiced character of the merged word descriptions;
  
  means for altering the pitch characteristic of said continuous message description in accordancE with a prescribed contour; and
  
  means for utilizing said continuous description to control a speech synthesizer.

2. A system for composing speech messages as defined in claim 1, wherein, said parametric description of each word in said vocabulary comprises:
- a representation of the formants, voiced and unvoiced amplitudes, and fricative pole-zero characteristics of said spoken word.

3. A system for composing speech messages as defined in claim 2, wherein, said representations are in a coded digital formant.

4. Apparatus for processing parametric descriptions of selected prerecorded spoken words to form a continuous description of a prescribed message suitable for actuating a speech synthesizer, which comprises:
- means for deriving a spectral derivative function for each word description of said message;
  
  means for individually altering the durations of selected word descriptions in accordance with stored timing information;
  
  means operative in response to said spectral derivative functions for developing parametric descriptions of transitions between voiced word regions scheduled to be merged to form said message;
  
  means for concatenating said altered word descriptions with said transition descriptions in accordance with said prescribed message to form a continuous parametric message description; and
  
  means for altering the pitch characteristic of said message description in accordance with prescribed rules.

5. Apparatus for processing parametric descriptions as defined in claim 4, wherein:
- said stored timing information comprises a schedule of word durations as a function of position in an input string of words, and of the number of phonemes per word.

6. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises:
- a schedule of word durations derived from rules based on common language usage.

7. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises:
- a schedule of word durations assembled from measurements of a naturally spoken version of said prescribed message.

8. Apparatus for processing parametric descriptions of selected prerecorded words, as defined in claim 4, wherein, said parametric descriptions of transitions are developed for the last 100 msec of the first of two words to be merged and the first 100 msec of the second of said two words to be merged.

9. Apparatus as defined in claim 8, wherein, the rate of transition between said two words is proportional to the average of said spectral derivatives for said two words.

10. Apparatus for processing parametric descriptions of selected words as defined in claim 4, wherein said means for altering the pitch characteristic of said message description comprises:
- a stored, time-normalized pitch contour for a selected number of different messages; and
  
  means for modifying said contour in accordance with said altered word description durations.

11. Apparatus for developing control signals for a speech synthesizer, which comprises:
- means supplied with word length segmental and prosodic functions of each individual word of a desired message for deriving the spectral derivatives of each of said functions;
  
  means responsive to said spectral derivatives for interpolating said segmental functions to establish contours which define smooth transitions between the words of said message;
  
  means for concatenating said segmental functions in accordance with said transition contours, and, means for utilizing said prosodic functions to alter said concatenated segmental functions to develop control waveform signals which approximate the waveform of said desired message.

12. Apparatus as defined in claim 11, wherein, said segmental functions include the format frequencies, unvoiced pole and zero frequencies and amplitudes of each of said words.

13. Apparatus as dEfined in claim 11, wherein, said prosodic functions include timing and pitch variations for said words as a function of message syntax.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
James Loton Flanagan, Lawrence Richard Rabiner, Ronald William Schafer
Original Assignee
James Loton Flanagan, Lawrence Richard Rabiner, Ronald William Schafer
Inventors
Flanagan, James Loton, Rabiner, Lawrence Richard, Schafer, Ronald William
Primary Examiner(s)
Claffy, Kathleen H.
Assistant Examiner(s)
Leaheey, Jon Bradford

Application Number

US05/085,660
Time in Patent Office

1,376 Days
Field of Search

179/1S,1A,1SB,15.55R,15.55T 324/77 340/148
US Class Current

704/268
CPC Class Codes

G10L 13/07 Concatenation rules

SPEECH SYNTHESIS BY CONCATENATION OF FORMANT ENCODED WORDS

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH SYNTHESIS BY CONCATENATION OF FORMANT ENCODED WORDS

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links