Method and apparatus for speech synthesizing

US 4,214,125 A
Filed: 01/21/1977
Issued: 07/22/1980
Est. Priority Date: 01/21/1977
Status: Expired due to Term

First Claim

Patent Images

1. A method of analyzing speech information comprising the steps of time quantizing the amplitude of electrical signals representative of selected speech information into digital form, selectively compressing the time quantized signals by discarding selected portions thereof while substantially simultaneously generating instruction signals as to which portions have been discarded, and storing both the compressed signals and instruction signals, wherein said method further includes:

(a) time differentiating the electrical signals prior to the time quantizing step and the signal compressing and storing steps include the steps,(b) selecting signals representative of certain phoneme and phoneme groups from the time quantized signals and replacing portions of these selected signals corresponding to parts of the pitch periods of the certain phonemes and phoneme groups by a constant amplitude signal while generating instruction signals as to which phonemes and phoneme groups have been so selected,(c) selecting signals representative of certain phonemes and phoneme groups from the time quantized signals and storing only portions of these selected time quantized signals corresponding to every nth pitch period of the waveform of the original speech information electrical signal, and storing instruction signals as to which phonemes and phoneme groups have been so selected and storing instruction signals as to the values of n,(d) separating and storing the time quantized signals representative of spoken words into two or more parts, with such parts of later words that are identical to parts of earlier words being deleted from storage while instructions signals as to which parts are deleted are stored,(e) storing portions of the time quantized signals corresponding to selected phonemes and phoneme groups according to their ability to blend naturally with any other phoneme, the selected phonemes and phoneme groups including voiced and unvoiced fricatives, voiced and unvoiced stop consonants, and nasal consonants,(f) delta-modulating the time quantized signals, and(g) Mozer phase-adjusting a selected periodic waveform by Fourier transforming the time quantized signals to generate a set of discrete amplitudes and phase angles, adjusting these phase angles so that the inverse Fourier transformation of the amplitudes and new phases is symmetric, inverse Fourier transforming the phase adjusted amplitudes and phases, storing one-half of a selected waveform as representative of each discrete set of phase adjusted amplitudes and phases and discarding the other half of the selected waveform.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for analyzing and synthesizing speech information in which a predetermined vocabulary is spoken into a microphone, the resulting electrical signals are differentiated with respect to time, digitized, and the digitized waveform is appropriately expanded or contracted by linear interpolation so that the pitch periods of all such waveforms have a uniform number of digitizations and the amplitudes are normalized with respect to a reference signal. These "standardized" speech information digital signals are then compressed in the computer by subjectively removing and discarding redundant speech information such as redundant pitch periods, portions of pitch periods, redundant phonemes and portions of phonemes, redundant amplitude information (delta modulation) and phase informaton (Fourier transformation). The compression techniques are selectively applied to certain of the speech information signals by listening to the reproduced, compressed information. The resulting compressed digital information and associated compression instruction signals produced in the computer are thereafter injected into the digital memories of a digital speech synthesizer where they can be selectively retrieved and audibly reproduced to recreate the original vocabulary words and sentences from them.

99 Citations

View as Search Results

100 Claims

1. A method of analyzing speech information comprising the steps of time quantizing the amplitude of electrical signals representative of selected speech information into digital form, selectively compressing the time quantized signals by discarding selected portions thereof while substantially simultaneously generating instruction signals as to which portions have been discarded, and storing both the compressed signals and instruction signals, wherein said method further includes:
- (a) time differentiating the electrical signals prior to the time quantizing step and the signal compressing and storing steps include the steps,(b) selecting signals representative of certain phoneme and phoneme groups from the time quantized signals and replacing portions of these selected signals corresponding to parts of the pitch periods of the certain phonemes and phoneme groups by a constant amplitude signal while generating instruction signals as to which phonemes and phoneme groups have been so selected,(c) selecting signals representative of certain phonemes and phoneme groups from the time quantized signals and storing only portions of these selected time quantized signals corresponding to every nth pitch period of the waveform of the original speech information electrical signal, and storing instruction signals as to which phonemes and phoneme groups have been so selected and storing instruction signals as to the values of n,(d) separating and storing the time quantized signals representative of spoken words into two or more parts, with such parts of later words that are identical to parts of earlier words being deleted from storage while instructions signals as to which parts are deleted are stored,(e) storing portions of the time quantized signals corresponding to selected phonemes and phoneme groups according to their ability to blend naturally with any other phoneme, the selected phonemes and phoneme groups including voiced and unvoiced fricatives, voiced and unvoiced stop consonants, and nasal consonants,(f) delta-modulating the time quantized signals, and(g) Mozer phase-adjusting a selected periodic waveform by Fourier transforming the time quantized signals to generate a set of discrete amplitudes and phase angles, adjusting these phase angles so that the inverse Fourier transformation of the amplitudes and new phases is symmetric, inverse Fourier transforming the phase adjusted amplitudes and phases, storing one-half of a selected waveform as representative of each discrete set of phase adjusted amplitudes and phases and discarding the other half of the selected waveform.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 11, 13)
- - 2. A method of analyzing speech as recited in claim 1, wherein the step of delta modulating the digital signals prior to storage comprises setting the value of the ith digitization of the sampled signal equal to the value of the (i-1)th digitization of the sampled signal plus f(Δ
    - _i-1, Δ
      
      _i) where f(Δ
      
      _i-1, Δ
      
      _i) is an arbitrary function having the property that changes of waveform of less than two levels from one digitization to the next are reproduced exactly while greater changes in either direction are accommodated by slewing in either direction by three levels per digitization.
  - 3. A method of analyzing speech as recited in claim 1, further comprising the steps of producing and storing speech waveforms having a constant pitch frequency.
  - 4. A method of analyzing speech as recited in claim 1 further comprising the steps of producing and storing speech waveforms having a constant amplitude.
  - 5. A method of analyzing speech as recited in claim 1 wherein the Mozer phase adjusting step comprises adjusting for a representative symmetric waveform to have a minimum amount of power in portions of the waveform totalling half of the period being analyzed and such that the difference between amplitudes of successive digitizations during the other half period of the selected waveform are consistent with possible values obtainable from the delta modulation step.
  - 6. A method of analyzing speech as recited in claim 1, further including the step of separately selected portions of the digital signals representative of at least five of the following phonemes and phoneme groups:
    - space="preserve" listing-type="tabular">______________________________________ Sound ______________________________________ "elve" as in "twelve" "ou" as in "hour" "ir" as in "thirteen" "one" "we" as in "twenty" "h" as in "hot" "p" as in "plus" "t" as in "two" "l" as in "plus" "sh" as in "she" "m" as in "minus" "oo" as in "two" "n" as in "one" "th" as in "three" "u" as in "minus" "ree" as in "three" "im" as in "times" "f" as in "four" "ver" as in "over" "our" as in "four" "ua" as in "equals" "ive" as in "five" "oi" as in "point" "s" as in "six" "vol" as in " volts" "v" as in "volt" "o" as in "ohms" "i" as in "six" "a" as in "and" "k" as in "six" "d" as in "and" "ev" as in "seven" "u" as in "up" "eigh" as in "eight" "il" as in "miles" "i" as in "nine" "ou" as in "pounds" "el" as in "eleven" "th" as in "the" "we" as in twelve" "z" as in "zero" ______________________________________
  - 7. A method of analyzing speech as recited in claim 1, further comprising the step of storing digital signals representative of dipthongs as individual phoneme groups.
  - 11. A method of analyzing speech as recited in claim 1, further comprising the steps of selectively retrieving certain of both the stored, compressed signals and the instruction signals, and utilizing the retrieved compressed signals and the instruction signals to reproduce selected speech information.
  - 13. A method of analyzing speech as recited in claim 11, further comprising the step of retrieving the digital signals from storage at a variable clock rate such that the pitch frequency of the reproduced speech sound is set at different levels and is made to rise or fall over the duration of speech sound whereby accenting of syllables, elimination of the monotone quality, inflection, and other pitch period variations of the speech synthesized can be reproduced.

8. A method of analyzing speech comprising the steps of generating electrical signals representative of the spoken vocabulary words and portions of spoken vocabulary words of a predetermined finite vocabulary with the vocabulary words being included into units containing a plurality of phonemes or phoneme groups, time quantizing the amplitude of the electrical signals into digital form, selectively compressing the time quantized signals by discarding selected portions of them while substantially simultaneously generating instruction signals as to which portions have been discarded, and storing selected portions of the digital signals representative of phonemes and phoneme groups in a first, addressable memory, storing the instruction signals in a second, addressable memory including instruction signals as to the sequence of addresses of the stored phonemes and phoneme groups necessary to reproduce words and sentences of the vocabulary, wherein the signal compressing and storing steps include the following steps:
- (a) selecting signals representative of certain phonemes and phoneme groups from the time quantized signals and replacing portions of these selected signals corresponding to parts of the pitch periods of the certain phonemes and phoneme groups by a constant amplitude signal while generating instruction signals as to which phonemes and phoneme groups have been so selected, and(b) Fourier transforming the time quantized signals to generate a set of discrete amplitudes and phase angles, adjusting the phase angles so that the inverse Fourier transformation of the amplitudes and new phases is symmetric, inverse Fourier transforming the phase adjusted amplitudes and phases, storing one-half of a selected waveform as representative of each discrete set of phase adjusted amplitudes and phases and discarding the other half of the selected waveform.
- View Dependent Claims (9, 10, 12)
- - 9. A method of analyzing speech as recited in claim 8 wherein in the method further comprises differentiating the electrical signals with respect to time prior to the time quantization step.
  - 10. A method of analyzing speech as recited in claim 8, wherein the signal compressing and storing steps further comprise the steps of selecting and storing in the first memory portions of the digital signals over a repetition period with the sum of the repetition periods having a duration which is less than the duration of the original speech waveform, setting the repetition period equal to the pitch period of the voiced speech to be synthesized and storing every nth pitch period of the waveform.
  - 12. A method of analyzing speech as recited in claim 8, further comprising the steps of selectively reproducing certain words of the vocabulary by retrieving selected instruction signals from the second memory and using the instruction signals to sequentially extract selected portions of the stored digital signals from the first memory, and electromechanically reproducing the selected portions of the digital signals extracted from the first memory as selected audible, spoken words of the vocabulary.

14. An improved speech synthesizer of the type having first addressable memory means for storing digital signal representations of analog electrical signals which represent portions of spoken words of a predetermined vocabulary, second addressable memory means for storing first instruction signals as to the addresses in the first memory means of signals representing portions of the vocabulary words, third addressable memory means for storing second instruction signals as to the addresses in the second memory means of the sequences of the first instruction signals necessary to form selected words of the vocabulary, reproduction means responsive to a digital signal output from the first memory means for reproducing these digital signals in audible form, and control logic means wherein the improvement comprises:
- the first addressable memory means stores digital signal representations of the spoken vocabulary words after having been reduced by predetermined compression techniques and the second addressable memory means further stores compression instruction signals for controlling the operation of the control logic means, the compression instruction signals corresponding to the predetermined compression techniques used to reduce the digital signal representations stored in the first addressable memory means, the control logic means being responsive to the compression instruction signals and modifying the output of first memory means in accordance with the compression instruction signals, and wherein the digital signal representations stored in the first addressable memory means and the corresponding compression instruction signals stored in the second addressable memory means are derived from the following predetermined compression techniques;
  
  (a) the digital signals stored in the first addressable memory means are the time quantization of the derivative with respect to time of analog electrical signals representing the phonemes and phoneme groups which are the constituents of the predetermined vocabulary,(b) the digital signals stored in the first addressable memory means are only selected portions of the digital signals representative of the spoken vocabulary words, with the portions being selected over a repetition period equal to the pitch period of the voiced speech to be synthesized and only those digital signals corresponding to every nth pitch being stored, and the compression instruction signals stored in the second memory means include instruction signals to the control logic means as to the number of times, n, that each such selected portion of data is to be repeatedly extracted from the first addressable memory means before a different signal portion is to be extracted,(c) the compression instruction signals stored by the second addressable memory means include instructions as to the addresses in the first adddressable memory means of digital signals corresponding to phonemes and phoneme groups which naturally blend with any other phoneme and phoneme group, including voiced and unvoiced fricatives, voiced and unvoiced stop consonants, and nasal consonants,(d) selected ones of the digital signals are representative of a predetermined fraction x of the latter part of the analog electrical signal within each pitch period of the spoken word, the compression instruction signals stored in the second memory means including x-period zeroing instruction signals as to the addresses of the selected ones of the digital signals in the first memory means and the control logic means includes means responsive to the x-period zeroing instruction signals for supplying to the reproduction means constant amplitude signals having durations equal to the remaining portions of the waveforms of the voiced phonemes and phoneme groups which are constituents of the predetermined vocabulary,(e) the digital signals are representative of the amplitude of the analog electrical signal over a regular, sampling time interval, the digital signals further being delta modulated by setting the value of the ith digitization of the sampled analog signal equal to the value of the (i-1) the digitization of the sampled analog signal plus f(Δ
  
  _i-1, Δ
  
  _i) where f(Δ
  
  _i-1, Δ
  
  _i) is an arbitrary function having the property that changes of waveform of than two levels from one digitization to the next are reproduced exactly while greater changes in either direction are accommodated by slewing in either direction by three levels per digitization,(f) the stored digital signals representative of spoken words are separated into two or more parts, and(g) the stored digital signals represent only one symmetric half of one selected waveform obtained by mozer phase adjusting the waveform by Fourier transforming the digital signals to generate a set of discrete amplitudes and phase angles, adjusting the phase angles so that the inverse Fourier transform waveforms are symmetric, and selecting the one waveform as representative of the set of symmetric waveforms, said control logic means including means responsive to receipt of instruction signals specifying digital signals stored in said first addressable memory means as Mozer phase adjusted signals for causing said reproduction means to expand said Mozer phase adjusted signals in audible form.
- View Dependent Claims (15, 16)
- - 15. An improved speech synthesizer as recited in claim 14 fabricated on a large scale integrated circuit (L.S.I.) chip.
  - 16. A speech synthesizer as recited in claim 14 wherein the control logic means further comprises means for retrieving the digital signals from the first memory at a variable clock rate such that the pitch frequency of the reproduced speech sound is set at different levels and is made to rise or fall over the duration of speech sound whereby accenting of syllables, elimination of the monotone quality, inflection, and other pitch period variations of the speech synthesized can be reproduced.

17. An improved speech synthesizer of the type having first addressable memory means for storing digital signal representations of analog electrical signals which represent portions of spoken words of a predetermined vocabulary, second addressable memory means for storing first instruction signals as to the addresses in the first memory means of signals representing portions of the vocabulary words, third addressable memory means for storing second instruction signals as to the addresses in the second memory means of the sequences of the first instruction signals necessary to form selected words of the vocabulary, reproduction means responsive to a digital signal output from the first memory means for reproducing these digital signals in audible form, and control logic means for selectively, sequentially extracting the second instruction signals from the third memory means and using these extracted second instruction signals for sequentially extracting selected first instruction signals from the second memory means, and using these extracted first instruction signals to sequentially extract selected digital signals from the first memory means to audibly reproduce selected words of the vocabulary through the reproduction means, wherein the improvement comprises:
- the first addressable memory means stores digital signal representations of the spoken vocabulary words after having been reduced by predetermined compression techniques and the second addressable memory means further stores compression instruction signals for controlling the operation of the control logic means, the compression instruction signals corresponding to the predetermined compression techniques used to reduce the digital signal representations stored in the first addressable memory means, the control logic means being responsive to the compression instruction signals and modifying the output of first memory means in accordance with the compression instruction signals, and wherein the digital signal representations stored in the first addressable memory means and the corresponding compression instruction signals stored in the second addressable memory means are derived from the following predetermined compression techniques;
  
  (a) selected ones of the digital signals are representative of a predetermined fraction x of the latter part of the analog electrical signal within each pitch period of the spoken word, the compression instruction signals stored in the second memory means including x-period zeroing instruction signals as to the addresses of the selected ones of the digital signals in the first memory means and the control logic means includes means responsive to the x-period zeroing instruction signals for supplying to the reproduction means constant amplitude signals having durations equal to the remaining portions of the waveforms of the voiced phonemes and phoneme groups which are constituents of the predetermined vocabulary, and(b) the stored digital signals represent only one symmetric half of one selected waveform obtained by Fourier transforming the digital signals to generate a set of discrete amplitudes and phase angles, adjusting the phase angles so that on the inverse Fourier transform waveforms are symmetric, and selecting the one waveform as representative of the set of symmetric waveforms.
- View Dependent Claims (18, 19)
- - 18. A speech synthesizer as recited in claim 17 wherein the compression instruction signals stored by the second addressable memory means include instructions as to the addresses in the first addressable memory means of digital signals corresponding to phonemes and phoneme groups which naturally blend with any other phoneme and phoneme group, including voiced and unvoiced fricatives, voiced and unvoiced stop consonants, and nasal consonants.
  - 19. A speech synthesizer as recited in claim 17 wherein the digital signals stored in the first addressable memory means have been delta modulated by setting the value of the ith digitization of the sampled analog electrical signals equal to the value of the (i-1)th digitization of the sampled analog electric signals plus f(Δ
    - _i-1, Δ
      
      _i) where f(Δ
      
      _i-1, Δ
      
      _i) is an arbitrary function having the property that changes of waveform of less than two levels from one digitization to the next are reproduced exactly while greater changes in either direction are accommodated by slewing in either direction by three levels per digitization.

20. A speech synthesizer comprisingfirst addressable memory means for storing digital signal representations of electrical signals which represent portions of spoken words of a predetermined vocabulary, all of the digital signals stored in the first memory means being the delta modulated, time quantization of the derivative with respect to time of analog electrical signals representing the phonemes and phoneme groups which are the constituents of the predetermined vocabulary, and the stored digital signals further representing only one symmetric half of one selected waveform obtained by Fourier transforming the delta modulated, time quantized derivative of the analog signals to generate a set of discrete amplitudes and phase angles, adjusting the phase angles so that on inverse Fourier transformation the waveforms are symmetric, and selecting the one waveform as representative of the set of symmetric waveforms,second addressable memory means for storing first instruction signals as to the addresses in the first addressable memory means of signals representing portions of the vocabulary words,third addressable memory means for storing second instruction signals as to the addresses in the second memory means of the sequences of the first instruction signals necessary to form selected words of the vocabulary,reproduction means responsive to the digital signal output of the first memory means for reproducing these digital signals in audible form, andcontrol logic means for selectively, sequentially extracting the second instruction signals from the third memory means and using these extracted second instruction signals for sequentially extracting selected first instruction signals from the second memory means, and using these extracted first instruction signals to sequentially extract selected digital signals from the first memory means to audibly reproduce selected words of the vocabulary through the reproduction means.
- View Dependent Claims (21)
- - 21. A speech synthesizer as recited in claim 20 wherein selected ones of the digital signals stored in the first memory means represent only a portion corresponding to part of the pitch period of the waveforms of certain of the voiced phonemes and phoneme groups which are constituents of the predetermined vocabulary;
    - the compression signals stored in the second addressable memory means include x-period zeroing instruction signals as to the addresses of the selected ones of such digital signals in the first addressble memory means and wherein the control logic means includes means responsive to the x-period zeroing instruction signals for supplying to the reproduction means constant amplitude signals having durations equal to the remaining portions of the waveforms of the voiced phonemes and phoneme groups which are constituents of the predetermined vocabulary.

22. A method of compressing information bearing signals such as speech to reduce the information content thereof without destroying the intelligibility thereof, said method comprising the steps of mozer phase adjusting said signals to produce equivalent signals having symmetric portions, and deleting selected redundant portions of said equivalent signals.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 96)
- - 23. The method of claim 22 wherein said step of phase adjusting includes the step of transforming said signals to the frequency domain to produce a set of discrete amplitudes and phase angles, adjusting said phase angles so that the inverse transformation of the amplitudes and adjusted phases is at least partially symmetric, and inversely transforming said amplitudes and adjusted phases to the time domain, and wherein said step of deleting includes the step of deleting redundant portions of those partially symmetric portions of said signals resulting from said step of inversely transforming.
  - 24. The method of claim 23 wherein said waveform resulting from said step of adjusting is substantially symmetric;
    - and wherein said step of deleting includes the step of deleting a symmetric half of said symmetric waveform.
  - 25. The method of claim 22 further including the step of time quantizing said signals prior to said step of phase adjusting.
  - 26. The method of claim 22 further including the step of time quantizing said signals after said step of phase adjusting.
  - 27. The method of claim 22 further including the step of time differentiating said signals prior to said step of phase adjusting.
  - 28. The method of claim 22 further including the step of time differentiating said signals after said step of phase adjusting.
  - 29. The method of claim 22 wherein said information bearing signals are speech signals containing portions corresponding to phonemes and phoneme groups, and wherein said method further includes the step ofselecting signals representative of particular phonemes and phoneme groups, deleting preselected parts of the phonemes and phoneme groups so selected, and generating first instruction signals identifying the phonemes and phoneme groups so selected.
  - 30. The method of claim 22 further including the steps of separating said signals into at least two parts, deleting parts occurring later in time which are substantially identical to parts occurring earlier in time, and generating instruction signals specifying those parts so deleted.
  - 31. The method of claim 22 further including the step of delta-modulating said equivalent signals.
  - 32. The method of claim 22 further including the step of storing in a memory device the signals resulting from said step of deleting.
  - 33. The method of claim 32 wherein said step of storing is preceded by the step of converting to digital signals said signals resulting from said step of deleting.
  - 34. The method of claim 32 wherein said information bearing signals are speech signals and wherein said step of storing includes the step of storing portions of said signals corresponding to selected phonemes and phoneme groups according to their ability to blend naturally with any other phoneme.
  - 96. The method of claim 22 wherein said information bearing signals are speech signals containing portions corresponding to phonemes and phoneme groups, and wherein said method further includes the step of selecting signals representative of portions of particular phonemes and phoneme groups lying between every nth pitch period, deleting the signals so selected, and generating second instruction signals specifying the particular portions of said phonemes and phoneme groups so selected for deletion and identifying the values of n.

35. A method of synthesizing signals from information signals previously compressed by the technique of phase adjusting original signals to produce equivalent signals having symmetric portions, deleting selected fractional portions of said symmetric portions of said equivalent signals and generating instruction signals identifying the selected fractional portions so deleted, and from said instruction signals, said method comprising the steps of:
- (a) reproducing said compressed information signals;
  
  (b) expanding the reproduced signals to supply said fractional portions in accordance with said instruction signals; and
  
  (c) converting the expanded reproduced signals to audible form.
- View Dependent Claims (36, 37, 38, 39, 40)
- - 36. The method of claim 35 wherein said compressed information signals are stored in a memory device and wherein said step (a) of reproducing includes the step of reading said compressed information signals from said memory device.
  - 37. The method of claim 36 wherein said compressed information signals are stored in said memory device in digital form and wherein said step (a) of reproducing includes the further step of converting said digital signals to analog signals prior to said step (c) of converting.
  - 38. The method of claim 35 wherein said compressed information signals are delta-modulated signals and wherein said step (a) of reproducing includes the step of delta-modulation decoding said compressed information signals.
  - 39. The method of claim 35 wherein said original signals are audio signals having phonemes and phoneme groups and wherein said information signals are of a type previously compressed by the additional technique of deleting preselected signals representative of portions of particular phonemes and phoneme groups from said audio signals, said preselected signals corresponding to the portions lying between every nth pitch period of said particular phonemes and phoneme groups, and generating additional instruction signals specifying said particular phonemes and phoneme groups and identifying the corresponding values of n, and wherein said step (a) of reproducing includes the step of sequentially repeating each non-deleted signal representative of said particular phonemes and phoneme groups a number of times equal to the corresponding value of n specified by the identifying instruction signal.
  - 40. The method of claim 35 wherein said information signals are of a type previously compressed by the additional technique of separating said original signals into at least two parts and deleting parts occurring later in time which are substantially identical to parts occurring earlier in time, said instruction signals specifying those parts so deleted, and wherein said step (a) of reproducing includes the step of repeating the non-deleted parts specified by said instruction signals.

41. A system for compressing information bearing input signals such as speech to reduce the information content thereof without destroying the intelligibility thereof, said system comprising:
- input means adapted to receive said input signals;
  
  means for Mozer phase adjusting said signals to produce equivalent signals having symmetric portions; and
  
  means for deleting selected redundant portions of said equivalent signals.
- View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50)
- - 42. The combination of claim 41 wherein said input signals are time domain signals and wherein said phase adjusting means includes means for transforming said input signals to said frequency domain to produce a set of discrete amplitudes and phase angles, means for adjusting said phase angles to produce a modified set of discrete amplitudes and phase angles capable of being inversely transformed to modified time domain signals having at least partially symmetric portions, and means for inverse transforming said phase adjusted set of discrete amplitudes and phase angles to the time domain to generate said modified time domain signals;
    - and wherein said deleting means includes means for deleting redundant portions of those partially symmetric portions of said modified time domain signals output from said inverse transforming means.
  - 43. The combination of claim 42 wherein said signals output from said inverse transforming means are substantially symmetric, and wherein said means for deleting includes means for deleting a symmetric half of said symmetric signals.
  - 44. The combination of claim 41 further including means coupled to said input means for time quantizing the amplitude of said input signals.
  - 45. The combination of claim 41 further including means coupled to said phase adjusting means for time quantizing the amplitude of signals output therefrom.
  - 46. The combination of claim 41 further including means coupled to said input means for time differentiating said input signals.
  - 47. The combination of claim 41 further including means coupled to said phase adjusting means for time differentiating said equivalent signals.
  - 48. The combination of claim 41 further including means coupled to said input means for deleting parts of said input signals occurring later in time which are substantially identical to parts occurring earlier in time, and means for generating instruction signals specifying those parts so deleted.
  - 49. The combination of claim 41 wherein said input signals are speech signals containing portions corresponding to phonemes and phoneme groups, and further including means coupled to said input means for selecting signals representative of particular phonemes and phoneme groups, means for deleting preselected parts of the phonemes and phoneme groups so selected, and means for generating first instruction signals identifying the phonemes and phoneme groups so selected.
  - 50. The combination of claim 41 wherein said input signals are audio signals having phonemes and phoneme groups and further including means for deleting preselected signals representative of portions of particular phonemes and phoneme groups from said audio signals, said preselected signals corresponding to those portions lying between every nth pitch period, and wherein said generating means includes means for generating second instruction signals specifying said particular phonemes and phoneme groups so selected and identifying the corresponding values of n.

51. A system for synthesizing signals from compressed information signals having the form of an inverse transformation of a partially symmetric phase adjusted transform of the original signals, said compressed information signals being devoid of selected portions corresponding to a fraction of the partially symmetric portions of said phase adjusted transform, and instruction signals identifying the selected portions, said system comprising:
- means for reproducing said compressed information signals;
  
  means coupled to said reproducing means for expanding the reproduced signals to supply said fractional portions in accordance with said instruction signals; and
  
  means for converting the expanded reproduced signals to audible form.
- View Dependent Claims (52, 53, 54, 55, 56)
- - 52. The combination of claim 51 further including memory means for storing said compressed signals and wherein said reproducing means includes means for reading said compressed signals from said memory means.
  - 53. The combination of claim 52 wherein said memory means comprises a digital storage device for storing said compressed signals in digital form, and wherein said reproducing means includes means for converting the digital signals stored therein to analog signals.
  - 54. The combination of claim 51 wherein said compressed information signals are delta-modulated signals, and wherein said reproducing means includes means for delta-modulation decoding said compressed information signals.
  - 55. The combination of claim 51 wherein said information signals are of a type previously compressed by the additional technique of deleting predetermined portions of said original signals corresponding to particular phonemes and phoneme groups, said predetermined portions lying between every nth pitch period of the corresponding phonemes and phoneme groups, said instruction signals further identifying the particular phonemes and phoneme groups and the corresponding values of n, and wherein said reproducing means includes means for sequentially repeating each of said predetermined portions of said compressed information signals corresponding to said particular phonemes and phoneme groups a number of times equal to the corresponding value of n specified by the identifying instruction signal.
  - 56. The combination of claim 51 wherein said information signals are of a type previously compressed by the additional technique of separating said original signals into at least two parts and deleting parts occurring later in time which are substantially identical to parts occurring earlier in time, said instruction signals specifying those parts so deleted, and wherein said reproducing means includes means for repeating the non-deleted parts specified by said instruction signals.

57. A method of processing information bearing signals to initially reduce the information content thereof without destroying the intelligibility of the information contained therein and to synthesize signals from the processed signals, said method comprising the steps of:
- (a) Mozer phase adjusting said information bearing signals to produce equivalent signals having substantially symmetric portions;
  
  (b) deleting selected redundant portions of said equivalent signals;
  
  (c) X period zeroing said information bearing signals by deleting preselected relatively low power portions of the signals resulting from steps (a) and (b);
  
  (d) generating instruction signals specifying those portions of said signals deleted in steps (b) and (c);
  
  (e) reproducing the signals resulting from said steps of (a) Mozer phase adjusting, (b) deleting and (c) X period zeroing;
  
  (f) expanding said reproduced signals to supply said deleted redundant portions in accordance with said instruction signals;
  
  (g) inserting substantially constant amplitude signals between the non-deleted portions of the signals resulting from step (f) in accordance with said instruction signals so that said deleted relatively low power signal portions are replaced by said signals of substantially constant amplitude; and
  
  (h) converting the signals resulting from step (g) to perceivable form.
- View Dependent Claims (58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69)
- - 58. The method of claim 57 wherein said information bearing signals are essentially periodic and wherein said preselected relatively low power portions lie in the range from 1/4 to 3/4 of the period.
  - 59. The method of claim 58 wherein said information bearing signals are speech signals and wherein said period comprises the pitch period of said speech signals.
  - 60. The method of claim 58 wherein said preselected portion is substantially 1/2.
  - 61. The method of claim 57 wherein said step of Mozer phase adjusting includes the step of transforming said information bearing signals to the frequency domain to produce a set of discrete amplitudes and phase angles, adjusting said phase angles so that the inverse transformation of the amplitudes and adjusted phases is at least partially symmetric, and inversely transforming said amplitudes and adjusting phases to the time domain;
    - and wherein said step (b) of deleting includes the step of deleting fractional portions of those partially symmetric portions of said signals resulting from said step of inversely transforming.
  - 62. The method of claim 61 wherein the signals resulting from said step of inversely transforming are substantially symmetric;
    - and wherein said step (b) of deleting includes the step of deleting a symmetric half of said symmetric signals.
  - 63. The method of claim 57 further including the step of storing in a memory device signals resulting from said steps of (b) deleting, (c) X period zeroing, and (d) generating.
  - 64. The method of claim 63 wherein said step of storing is preceded by the step of converting said signals resulting from said steps of (b) deleting, (c) X period zeroing, and (d) generating to digital signals.
  - 65. The method of claim 57 wherein said information bearing signals comprise audio electrical signals.
  - 66. The method of claim 57 wherein said signals resulting from said steps of (b) deleting, (c) X period zeroing, and (d) generating are stored in a memory device, and wherein said step (e) of reproducing includes the step of reading the stored signals from said memory device.
  - 67. The method of claim 66 wherein said stored signals are stored in said memory device in digital form, and wherein said step (e) of reproducing includes the step of converting said digital signals to analog signals.
  - 68. The method of claim 57 wherein said signals resulting from said step (b) deleting, (c) X period zeroing, and (d) generating are delta-modulated signals, and wherein said step (e) of reproducing includes the step of delta-modulation decoding said resulting signals.
  - 69. The method of claim 61 wherein said step of (f) expanding the reproduced signals includes the step of supplying said fractional portions in accordance with said instruction signals.

70. A system for processing information bearing input signals to initially compress said input signals by reducing the information content thereof without destroying the intelligibility thereof and subsequently synthesizing signals from said compressed signals, said system comprising:
- input means adapted to receive said input signals;
  
  means coupled to said input means for Mozer phase adjusting said input signals to produce equivalent signals having substantially symmetric portions;
  
  means for deleting selected redundant portions of said equivalent signals;
  
  means for X period zeroing the signals processed by said Mozer phase adjusting means and said deleting means by deleting preselected relatively low power portions of the processed signals;
  
  means for generating instruction signals specifying those portions of said input signals deleted by said deleting means and said X period zeroing means;
  
  means for reproducing the signals processed by said X period zeroing means;
  
  means for expanding the reproduced signals to supply said deleted redundant portions in accordance with said instruction signals;
  
  means for inserting substantially constant amplitude signals between the non-deleted portions of the signals generated by said expanding means in accordance with said instruction signals so that said deleted relatively low power signal portions are replaced by said signals of substantially constant amplitude; and
  
  means for converting the signals output from said inserting means to perceivable form.
- View Dependent Claims (71, 72, 73, 74, 75, 76, 77, 78, 79, 80)
- - 71. The combination of claim 70 wherein said input signals are essentially periodic and wherein said preselected portions lie in the range from 1/4 to 3/4 of the period.
  - 72. The combination of claim 71 wherein said predetermined portion is substantially 1/2.
  - 73. The combination of claim 71 wherein said input signals are speech signals and wherein said period comprises the pitch period of said speech signals.
  - 74. The combination of claim 70 further including means coupled to said deleting means for delta modulating the signals output therefrom.
  - 75. The combination of claim 78 further including means coupled to said deleting means and said generating means for storing the signals output therefrom.
  - 76. The combination of claim 75 further including means coupled to said deleting means and said generating means for converting the signals output therefrom to digital form.
  - 77. The combination of claim 70 wherein said input signals are time domain signals and wherein said Mozer phase adjusting means includes means for transforming said input signals to the frequency domain to produce a set of discrete amplitudes and phase angles, means for adjusting said phase angles to produce a modified set of discrete amplitudes and phase angles capable of being inversely transformed to modified time domain signals having at least partially symmetric portions, and means for inverse transforming said phase adjusted set of discrete amplitudes and phase angles to the time domain to generate said modified time domain signals;
    - and wherein said deleting means includes means for deleting fractional portions of those partially symmetric portions of said modified time domain signals output from said inverse transforming means.
  - 78. The combination of claim 77 wherein said signals output from said inverse transforming means are substantially symmetric, and wherein said deleting means includes means for deleting a symmetric half of said symmetric signals.
  - 79. The combination of claim 74 wherein said reproducing means includes means for delta-modulation decoding said compressed information signals.
  - 80. The combination of claim 77 wherein said means for expanding includes means for supplying said deleted fractional portions in accordance with said instruction signals.

81. In a synthesizer of original information bearing time domain signals from compressed information time domain signals produced by predetermined different signal compression techniques, said compressed information time domain signals comprising an inverse transformation of a Mozer phase adjusted transform of said original time domain signals, a memory device comprising:
- means for storing said compressed information time domain signals and instruction signals specifying the particular compression technique applied to said original information bearing time domain signals to produce corresponding portions of said compressed information time domain signals, said compressed information time domain signals comprising a plurality of samples resulting from said predetermined signal compression techniques, the number of said different signal compression techniques applied to said original signal being greater than 2, the ratio of said plurality of samples to the minimum number of samples required to uniquely and intelligibly identify said original information bearing signals being no greater than about 0.2, and means for expanding said compressed signals comprising said inverse transform.
- View Dependent Claims (82, 83, 84, 85, 86, 87, 88, 89)
- - 82. The combination of claim 81 wherein said ratio is no greater than about 0.05.
  - 83. The combination of claim 81 wherein said ratio is no greater than about 0.0125.
  - 84. The combination of claim 81 wherein said storing means comprises a digital storage device and wherein said compressed information time domain signal samples are digital characters.
  - 85. The combination of claim 81 wherein said compressed information time domain signals and said instruction signals comprise X period zeroed representations of said original time domain signals, wherein X is a fraction in the range from 1/4 to 3/4.
  - 86. The combination of claim 85 wherein X is 1/2.
  - 87. The combination of claim 81 wherein said compressed information time domain signals and said instruction signals comprise an inverse transformation of a partially symmetric Mozer phase adjusted transform of said original time domain signals.
  - 88. The combination of claim 81 wherein said compressed information time domain signals comprise delta modulated representations of said original time domain signals.
  - 89. The combination of claim 88 wherein said compressed information time domain signals comprise floating-zero, two-bit delta modulated representations of said original time domain signals.

90. A method of compressing information bearing signals comprising the steps of:
- (a) phase adjusting said information bearing signals to produce equivalent signals having substantially symmetric portions;
  
  (b) deleting selected redundant portions of said equivalent signals; and
  
  (c) processing said equivalent signals by the additional signal compression technique of X period zeroing said information bearing signals.
- View Dependent Claims (91, 92, 93, 94, 95)
- - 91. The method of claim 90 further including the step of delta modulating the signals resulting from said step (b) of deleting.
  - 92. The method of claim 90 wherein said step (a) of phase adjusting includes the step of transforming said information bearing signals to the frequency domain to produce a set of discrete amplitudes and phase angles, adjusting said phase angles, and inversely transforming said amplitudes and adjusted phases to the time domain.
  - 93. The method of claim 92 wherein said step of adjusting includes the step of adjusting said phase angles so that the inverse transformation of the amplitudes and adjusted phases contains a minimum amount of power in said preselected portions.
  - 94. The combination of claim 93 wherein said step (c) of processing includes the step of delta modulating said equivalent signals and wherein said step of adjusting includes the step of adjusting said phase angles so that the inverse transformation of the amplitudes and adjusted phases is such that the difference between amplitudes of successive digitizations thereof are consistent with possible values obtainable from said step of delta modulating.
  - 95. The method of claim 91 wherein said step of delta modulating includes the steps of time quantizing successive amplitude points of said equivalent signals, forming a first difference by subtracting the (n-1)st time quantized amplitude point from the nth time quantized amplitude point and a second difference by subtracting the nth time quantized amplitude point from the (n+1)st time quantized amplitude point, and generating a signal representative of said second difference and restricted to one of a predetermined confined number of values when said first difference is within the most positive 1/2 of said confined number of values and generating a signal representative of said second difference and restricted to the negative of said one of a predetermined confined number of values when said first difference is within the most negative half of said confined number of values.

97. For use with a memory element containing compressed information time domain signals produced by predetermined signal compression techniques and instruction signals specifying the particular compression techniques applied to original information bearing time domain signals to produce corresponding portions of said compressed information time domain signals, said predetermined signal compression techniques including Mozer phase adjusting of said original information bearing time domain signals, a controller device for synthesizing said original information bearing time domain signals, said controller device comprising:
- controller storage means having an input adapted to be coupled to said memory element for sequentially receiving ordered ones of said compressed information time domain signals;
  
  means adapted to be coupled to said controller storage means for generating control signals enabling said ordered ones of said compressed information time domain signals to be coupled to said controller storage means, said control signal generator means including means for receiving corresponding ones of said instruction signals identifying the type of compression technique applied to said ordered ones of said compressed information time domain signals associated with said control signals;
  
  converter means coupled to said controller storage means for converting said ordered ones of said compressed information time domain signals to synthetic analog signals corresponding to said original information bearing time domain signals; and
  
  means responsive to receipt of a Mozer phase adjust instruction signal from said memory element for causing compressed information time domain signals stored in said controller storage means to be sequentially coupled to said converter means in a first ordered manner and subsequently causing the same signals stored in said controller storage means to be sequentially coupled to said converter means in a reverse manner from said first ordered manner.
- View Dependent Claims (98, 99, 100)
- - 98. The combination of claim 97 wherein said compressed signals and said instruction signals are digital characters, said controller storage means comprises a digital storage device, and said converter means includes digital-to-analog converter means for converting ordered ones of said compressed information time domain digital characters of said synthetic analog signals.
  - 99. The combination of claim 97 wherein said predetermined signal compression techniques include X period zeroing of said original information bearing time domain signals, and wherein said controller device further includes means responsive to receipt of an X period zero instruction signal from said memory element for causing said converter means to output a signal of substantially constant amplitude as a portion of the synthetic analog signal generated thereby.
  - 100. The combination of claim 97 wherein said predetermined signal compression techniques include delta modulation of said original information bearing time domain signals, and wherein said controller device further includes means coupled to said controller storage means for delta demodulating signals appearing at the output thereof, when enabled, and means coupled to said delta demodulating means and responsive to the receipt by said control means of a delta modulation instruction signal from said memory element for enabling said delta demodulating means to delta demodulate the ordered ones of said compressed information signals corresponding to said delta demodulation instruction signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Electronic Speech Systems, ESS Technology Incorporated (Semiconductor Holding Corp.)
Original Assignee
Forrest S. Mozer
Inventors
Mozer, Forrest S., Stauduhar, Richard P.
Primary Examiner(s)
Morrison, Malcolm A.
Assistant Examiner(s)
Kemeny, E. S.

Application Number

US05/761,210
Time in Patent Office

1,278 Days
Field of Search

179/1 SM, 179/1 SA, 179/15 A, 179/15 AC, 179/15 PC, 179/15.55 T
US Class Current

704/268
CPC Class Codes

G10L 13/047 Architecture of speech synt...

G10L 19/00 Speech or audio signals ana...

Method and apparatus for speech synthesizing

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

99 Citations

100 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for speech synthesizing

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

99 Citations

100 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links