Voice synthesizer

US 4,128,737 A
Filed: 08/16/1976
Issued: 12/05/1978
Est. Priority Date: 08/16/1976
Status: Expired due to Term

First Claim

Patent Images

1. In an electronic device for phonetically synthesizing human speech by synthetically generating and combining the basic phonetic sounds in speech including input means responsive to successive input data identifying a desired sequence of phonemes for producing control signals comprising the parameters electronically defining the articulation patterns of said desired sequence of phonemes, a vocal source adapted to produce a voiced excitation signal having associated therewith a fundamental frequency, and output means responsive to said control signals for electronically forming the articulation patterns of said desired sequence of phonemes and further responsive to said voiced excitation signal for producing said desired sequence of phonemes;

the improvement comprising;

inflection control means connected to said vocal source for automatically varying the fundamental frequency of said voiced excitation signal in accordance with certain of said control signals produced by said input means.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice synthesizer that is responsive to sequences of digital input command words to phonetically synthesize human speech. The system includes control circuits that are responsive to the input command words to introduce an articulated silent phoneme into the speech pattern, vary the duration of each phoneme produced, as well as to vary the overall rate and volume of the speech generated. In addition, the design utilizes inflection assignment derived from control signals controlling phoneme articulation, for individual phonemes and also employs a glottal waveform which is more representative of human glottis action. The invention also incorporates resonant suppression into the vocal tract to simulate the dampening effect due to the opening of the glottis, and provides closer simulation of human energy content at higher frequencies to improve the quality of the speech generated.

41 Citations

View as Search Results

66 Claims

1. In an electronic device for phonetically synthesizing human speech by synthetically generating and combining the basic phonetic sounds in speech including input means responsive to successive input data identifying a desired sequence of phonemes for producing control signals comprising the parameters electronically defining the articulation patterns of said desired sequence of phonemes, a vocal source adapted to produce a voiced excitation signal having associated therewith a fundamental frequency, and output means responsive to said control signals for electronically forming the articulation patterns of said desired sequence of phonemes and further responsive to said voiced excitation signal for producing said desired sequence of phonemes;
- the improvement comprising;
  
  inflection control means connected to said vocal source for automatically varying the fundamental frequency of said voiced excitation signal in accordance with certain of said control signals produced by said input means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The speech synthesizer of claim 1 wherein said inflection control means is further adapted to vary the fundamental frequency of said voiced excitation signal by an amount related to the magnitudes of said certain of said control signals.
  - 3. The speech synthesizer of claim 1 wherein said inflection control means is further responsive to said input data to vary the fundamental frequency of said voiced excitation signal.
  - 4. The speech synthesizer of claim 3 wherein said input data comprises a plurality of 12-bit digital command words wherein three of the input bits from each of said command words are applied to said inflection control means to vary the fundamental frequency of said voiced excitation signal.
  - 5. The speech synthesizer of claim 1 further including a fricative source adapted to produce an unvoiced excitation signal.
  - 6. The speech synthesizer of claim 5 wherein one of said control signals is produced by said input means whenever a phoneme requiring fricative energy is to be generated, and said inflection control means is adapted to increase the fundamental frequency of said voiced excitation signal whenever said one control signal is produced.
  - 7. The speech synthesizer of claim 1 wherein one of said control signals is produced by said input means whenever a nasal phoneme is to be generated, and said inflection control means is adapted to decrease the fundamental frequency of said voiced excitation signal whenever said one control signal is produced.
  - 8. The speech synthesizer of claim 1 wherein said output means includes vocal tract means comprising a plurality of resonant filters that are adapted to substantially produce the frequency spectrum of each phoneme in said desired sequence of phonemes, said plurality of resonant filters including at least one variable resonant filter that is tunable under the control of one of said control signals and is adapted to produce the first resonant formant in the frequency spectrums of said desired sequence of phonemes.
  - 9. The speech synthesizer of claim 8 wherein said inflection control means is adapted to decrease the fundamental frequency of said voiced excitation signal whenever said one control signal is produced.
  - 10. The speech synthesizer of claim 1 wherein a first of said control signals is produced by said input means whenever a phoneme requiring vocal energy is to be generated and a second of said control signals is produced by said input means whenever a plosive phoneme is to be generated, and said inflection control means is adapted to decrease the fundamental frequency of said voiced excitation signal whenever said first control signal and said second control signal are produced for the same phoneme.

11. An electronic device for phonetically synthesizing human speech comprising:
- input means responsive to input data identifying a desired sequence of phonemes to produce control signals representing the parameters defining said desired sequence of phonemes;
  
  a vocal source adapted to produce a voiced excitation signal having a waveform of varying magnitude;
  
  vocal tract means responsive to said voiced excitation signal and said control signals to produce said desired sequence of phonemes including a plurality of resonant filters having predetermined bandwidths associated therewith adapted to produce the resonant formants in the frequency spectrums of said phonemes; and
  
  suppression means for simulating the suppression of formant resonances in the human vocal tract due to the opening of the glottis by varying the bandwidths of at least some of said plurality of resonant filters in accordance with the magnitude of said voiced excitation signal.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 12. The speech synthesizer of claim 11 wherein said suppression means increases said bandwidths as the magnitude of said voiced excitation signal increases.
  - 13. The speech synthesizer of claim 12 wherein said suppression means is adapted to produce a variable pulse width square wave signal whose duty cycle is proportional to the magnitude of said voiced excitation signal.
  - 14. The speech synthesizer of claim 13 wherein each of said resonant filters affected by said suppression means has a bandpass section thereof having connected in shunt therewith an electronic control device that is adapted to conduct a current across said bandpass section under the control of said suppression signal such that the percentage of time during which said electronic control device conducts current is related to the percentage duty cycle of said suppression signal.
  - 15. The speech synthesizer of claim 14 wherein said suppression signal is applied to the three resonant filters in said vocal tract means adapted to produce the first three resonant formants in the frequency spectrums of said phonemes.
  - 16. The speech synthesizer of claim 12 wherein said vocal source is adapted to produce a voiced excitation signal having a waveform comprised of a first segment that increases in magnitude, a second segment that decreases in magnitude, and a third segment that remains at a constant magnitude.
  - 17. The speech synthesizer of claim 16 wherein said first segment increases relatively gradually in magnitude from an original level to a maximum level, said second segment declines relatively rapidly in magnitude from said maximum level to said original level, and said third segment remains constant at said original magnitude level.
  - 18. The speech synthesizer of claim 17 wherein said voiced excitation signal comprises substantially a truncated sawtooth waveform.
  - 19. The speech synthesizer of claim 16 wherein said suppression means increases said predetermined bandwidths of said resonant filters during said first segment of said voiced excitation signal, decreases said bandwidths of said resonant filters from said increased levels during said second segment of said voiced excitation signal, and has no effect on said predetermined bandwidths of said resonant filters during said third segment of said voiced excitation signal.
  - 20. The speech synthesizer of claim 19 wherein the duration of said third segment of said voiced excitation signal is at least as great as the duration of said first and second segments combined.
  - 21. The speech synthesizer of claim 11 wherein said suppression means is adapted to vary said bandwidths in accordance with the magnitude of said voiced excitation signal only during the production of phonemes requiring voiced excitation energy.
  - 22. The speech synthesizer of claim 21 wherein one of said control signals is produced by said input means whenever a phoneme requiring vocal energy is to be generated, and said suppression means is adapted to effect the bandwidths of said resonant filters only when said one control signal is produced.
  - 23. The speech synthesizer of claim 22 wherein said one control signal comprises a vocal amplitude control signal.
  - 24. The speech synthesizer of claim 11 further including circuit means for adding a relatively high fixed frequency formant to said voiced excitation signal to increase the excitation energy of said voiced excitation signal at high frequencies.
  - 25. The speech synthesizer of claim 24 wherein said circuit meeans comprises a fixed-pole resonant filter.
  - 26. The speech synthesizer of claim 25 wherein said resonant filter is adapted to resonate at a frequency of approximately 4000 Hz.
  - 27. The speech synthesizer of claim 26 wherein said plurality of resonant filters in said vocal tract means includes a fixed-pole resonant filter adapted to resonate at a frequency greater than 4000 Hz.
  - 28. The speech synthesizer of claim 27 wherein said fixed-pole resonant filter in said vocal tract means is adapted to resonate at a frequency of approximately 4400 Hz.
  - 29. The speech synthesizer of claim 24 wherein said plurality of resonant filters in said vocal tract means are connected in cascaded form.

30. An electronic device for phonetically synthesizing human speech comprising:
- input means responsive to input data identifying a desired sequence of phonemes to produce control signals representing the parameters defining said sequence of phonemes; and
  
  vocal tract means responsive to said control signals to produce said desired sequence of phonemes including a plurality of resonant filters that produce the resonant formants in the frequency spectrums of said desired sequence of phonemes, said plurality of resonant filters including three variable resonant filters each tunable under the control of one of said control signals to produce the first three formants in said frequency spectrums and a fourth variable resonant filter tunable under the control of one of said control signals that tunes one of said first three variable resonant filters to produce the fourth formant in said frequency spectrums.
- View Dependent Claims (31, 32, 33, 34, 35)
- - 31. The speech synthesizer of claim 30 wherein said fourth resonant filter is tunable under the control of the same control signal that tunes said third resonant filter.
  - 32. The speech synthesizer of claim 30 further including vocal source means for providing voiced excitation energy to said vocal tract means by producing a voiced excitation signal that contains a relatively wide distribution of both odd and even harmonics and additionally contains a relatively high fixed frequency formant that maintains the energy content of said excitation signal above a predetermined level at relatively high frequencies.
  - 33. The speech synthesizer of claim 32 wherein said vocal tract means includes a fifth resonant filter adapted to resonate at a frequency higher than said relatively high fixed frequency formant in said voiced excitation signal.
  - 34. The speech synthesizer of claim 33 wherein said fixed frequency formant in said voiced excitation signal is located at approximately 4000 Hz and said fifth resonant filter of said vocal tract means is adapted to resonate at approximately 4400 Hz.
  - 35. The speech synthesizer of claim 30 wherein said plurality of resonant filters in said vocal tract means are connected in cascaded form.

36. An electronic device for phonetically synthesizing human speech comprising:
- a vocal source adapted to produce a voiced excitation signal;
  
  a fricative source adapted to produce an unvoiced excitation signal;
  
  input means responsive to the receipt of input data identifying a desired sequence of phonemes to produce a plurality of control signals representing the parameters defining the phonemes identified by said input data including a first control signal for controlling the amplitude of said voiced excitation signal and a second control signal for controlling the amplitude of said unvoiced excitation signal;
  
  vocal tract means responsive to said voiced and unvoiced excitation signals and said control signals to produce an audio output comprised of said desired sequence of phonemes integrated into intelligible human speech; and
  
  amplitude control means for varying the relative overall amplitude of said audio output by modulating a preselected signal characteristic of said first and second control signals.
- View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
- - 37. The speech synthesizer of claim 36 wherein said amplitude control means is responsive to predetermined input data to vary the relative overall amplitude of said audio output while preserving the relative amplitude variations in said voiced and unvoiced excitation signals that occur from phoneme to phoneme under the control of said first and second control signals respectively, by continuously modulating said preselected signal characteristic of said first and second control signals by a certain percentage.
  - 38. The speech synthesizer of claim 37 wherein said input data comprises a plurality of digital command words, each comprised of a plurality of input bits, and said amplitude control means is responsive to predetermined digital command words to modulate said preselected signal characteristic of said first and second control signals in accordance with the value of certain of the input bits in said predetermined digital command words.
  - 39. The speech synthesizer of claim 38 wherein said certain percentage of modulation is determined by the value of said certain input bits in said predetermined digital command words.
  - 40. The speech synthesizer of claim 39 wherein said preselected signal characteristic corresponds to the amplitude of said first and second control signals.
  - 41. The speech synthesizer of claim 40 wherein said amplitude control means includes means for producing a d.c. signal whose magnitude is determined by the value of said certain input bits, and control means adapted to modulate the amplitude of said first and second control signals in accordance with the magnitude of said d.c. signal.
  - 42. The speech synthesizer of claim 41 wherein said control means includes a first electronic control device adapted to conduct said d.c. signal under the control of said first control signal and a second electronic control device adapted to conduct said d.c. signal under the control of said second control signal.
  - 43. The speech synthesizer of claim 42 wherein said first electronic control device comprises an analog gate having its input connected to receive said d.c. signal and its control terminal connected to receive said first control signal, and said second electronic control device comprises an analog gate having its input connected to receive said d.c. signal and its control terminal connected to receive said second control signal.
  - 44. The speech synthesizer of claim 36 further including circuit means responsive to said input data for producing a silent phoneme by preventing said voiced and unvoiced excitation signals from exciting said vocal tract means.
  - 45. The speech synthesizer of claim 44 further including first modulating means for modulating the amplitude of said voiced excitation signal in accordance with said first control signal and second modulating means for modulating the amplitude of said unvoiced excitation signal in accordance with said second control signal.
  - 46. The speech synthesizer of claim 45 wherein said circuit means is adapted to exclude said first and second control signals from said first and second modulating means respectively in response to the receipt of predetermined input data.
  - 47. The speech synthesizer of claim 46 wherein said circuit means includes means for producing an enabling signal until said predetermined input data is received, and control means connected between said input means and said first and second modulating means and adapted to prevent said first control signal from being transmitted to said first modulating means and said second control signal from being transmitted to said second modulating means upon the termination of said enabling signal.

48. An electronic device for phonetically synthesizing human speech including:
- input means responsive to input data identifying a desired sequence of phonemes to produce control signals representing the parameters defining said phonemes;
  
  timing means responsive to one of said control signals to produce a timing signal that determines the duration of production of each of said phonemes;
  
  vocal tract means responsive to said control signals to produce an audio output comprised of said desired sequence of phonemes; and
  
  first rate control means responsive to said input data for varying phoneme timing by producing a speech rate signal in accordance with said input data that is provided to said timing means to vary said timing signal, said first rate control means including second rate control means responsive to predetermined input data to vary the relative overall speech rate of said audio output while preserving the relative variations in the intervals of phoneme production that occur from phoneme to phoneme under the control of said one control signal by uniformly varying a preselected signal characteristic of said speech rate signal.
- View Dependent Claims (49, 50, 51, 52, 53, 66)
- - 49. The speech synthesizer of claim 48 wherein said first rate control means is adapted to produce a speech rate signal comprising a variable pulse width square wave whose duty cycle is determined by said input data.
  - 50. The speech synthesizer of claim 49 wherein said second rate control means is adapted to produce an output signal in accordance with said predetermined input data whose magnitude also determines the duty cycle of said speech rate signal.
  - 51. The speech synthesizer of claim 50 wherein said timing signal comprises a ramp signal that varies between two predetermined magnitude levels in a time interval that determines the duration of phoneme production, and the slope of said timing signal is determined by the duty cycle of said speech rate signal.
  - 52. The speech synthesizer of claim 50 wherein said input data comprises a plurality of digital command words each comprising a plurality of input bits, and the duty cycle of said speech rate signal is determined by the value of certain of said input bits in each of said digital command words.
  - 53. The speech synthesizer of claim 52 wherein said second rate control means is responsive to predetermined digital command words to vary the magnitude of said output signal in accordance with the value of certain of said input bits in said predetermined digital command words.
  - 66. The speech synthesizer of claim 48 further including variable rate transition means connected between said input means and said vocal tract means for smoothing the abrupt variations that occur in said control signals between successive phonemes, said variable rate transition means having associated therewith response times which are adapted to be varied in accordance with variations in said preselected signal characteristic of said speech rate signal.

54. An electronic device for phonetically synthesizing human speech including:
- input means responsive to input data identifying a desired sequence of phonemes to produce a plurality of control signals representing the parameters defining said desired sequence of phonemes;
  
  a vocal source adapted to produce a voiced excitation signal;
  
  a fricative source adapted to produce an unvoiced excitation signal;
  
  vocal tract means responsive to said voiced and unvoiced excitation signals to produce an audio output comprised of said sequence of phonemes in accordance with said control signals; and
  
  circuit means responsive to said input data for causing said vocal tract means to produce a silent phoneme by preventing said voiced and unvoiced excitation signals from exciting said vocal tract means, said vocal tract means being adapted to form in accordance with said control signals the articulation pattern of the succeeding phoneme indentified by said input data during production of said silent phoneme.
- View Dependent Claims (55, 56, 57, 58, 59, 60)
- - 55. The speech synthesizer of claim 54 further including first modulating means for modulating the amplitude of said voiced excitation signal in accordance with a first of said control signals produced by said input means whenever a phoneme requiring vocal energy is to be generated, and second modulating means for modulating the amplitude of said unvoiced excitation signal in accordance with a second of said control signals produced by said input means whenever a phoneme requiring fricative energy is to be produced.
  - 56. The speech synthesizer of claim 55 wherein said circuit means is adapted to exclude said first and second control signals from said first and second modulating means respectively in response to the receipt of predetermined input data.
  - 57. The speech synthesizer of claim 56 wherein said circuit means includes means for producing an enabling signal until said predetermined input data is received, and control means connected between said input means and said first and second modulating means that is adapted to prevent both said first control signal from being transmitted to said first modulating means and said second control signal from being transmitted to said second modulating means upon the termination of said enabling signal.
  - 58. The speech synthesizer of claim 57 wherein said control means includes a first electronic control device adapted to conduct said first control signal whenever said enabling signal is produced and a second electronic control device adapted to conduct said second control signal whenever said enabling signal is produced.
  - 59. The speech synthesizer of claim 55 further including amplitude control means responsive to said input data to vary the relative overall amplitude of said audio output by continuously modulating a preselected signal characteristic of said first and second control signals by a certain percentage determined by said input data.
  - 60. The speech synthesizer of claim 59 wherein said circuit means is adapted to preserve said certain percentage of modulation that existed prior to the silent phoneme so that the relative overall amplitude level of the audio output that existed prior to the silent phoneme will continue to exist after the silent phoneme.

61. For an electronic device for phonetically synthesizing human speech including a vocal source adapted to produce a voiced excitation signal and a vocal tract responsive to said voiced excitation signal to substantially produce the frequency spectrums of a desired sequence of phonemes;
- high pole compensation means adapted to add a relatively high fixed-frequency formant to said voiced excitation signal to increase the energy content of said voiced excitation signal at relatively high frequencies.
- View Dependent Claims (62, 63, 64, 65)
- - 62. The speech synthesizer of claim 61 wherein said vocal tract includes a plurality of resonant filters including at least one resonant filter that is adapted to resonate at a frequency higher than the high frequency formant added to said voiced excitation signal.
  - 63. The speech synthesizer of claim 62 wherein said plurality of resonant filters are connected in cascaded form.
  - 64. The speech synthesizer of claim 62 wherein said one resonant filter is adapted to resonate at 4400 Hz and said high frequency formant is located at 4000 Hz.
  - 65. The speech synthesizer of claim 61 wherein said vocal source is adapted to produce a voiced excitation signal comprising a truncated sawtooth waveform.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Federal Screw Works
Original Assignee
Federal Screw Works
Inventors
Dorais, Mark V.
Primary Examiner(s)
Claffy, Kathleen H.
Assistant Examiner(s)
Kemeny, E. S.

Application Number

US05/714,495
Time in Patent Office

841 Days
Field of Search

179/1 SA, 179/1 SF, 179/1 SM, 179/1 SG
US Class Current

704/265
CPC Class Codes

G10L 13/04 Details of speech synthesis...

Voice synthesizer

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

66 Claims

Specification

Solutions

Use Cases

Quick Links

Voice synthesizer

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

66 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links