Method and apparatus for speech synthesizing
First Claim
1. A method of analyzing speech information comprising the steps of time quantizing the amplitude of electrical signals representative of selected speech information into digital form, selectively compressing the time quantized signals by discarding selected portions thereof while substantially simultaneously generating instruction signals as to which portions have been discarded, and storing both the compressed signals and instruction signals, wherein said method further includes:
- (a) time differentiating the electrical signals prior to the time quantizing step and the signal compressing and storing steps include the steps,(b) selecting signals representative of certain phoneme and phoneme groups from the time quantized signals and replacing portions of these selected signals corresponding to parts of the pitch periods of the certain phonemes and phoneme groups by a constant amplitude signal while generating instruction signals as to which phonemes and phoneme groups have been so selected,(c) selecting signals representative of certain phonemes and phoneme groups from the time quantized signals and storing only portions of these selected time quantized signals corresponding to every nth pitch period of the waveform of the original speech information electrical signal, and storing instruction signals as to which phonemes and phoneme groups have been so selected and storing instruction signals as to the values of n,(d) separating and storing the time quantized signals representative of spoken words into two or more parts, with such parts of later words that are identical to parts of earlier words being deleted from storage while instructions signals as to which parts are deleted are stored,(e) storing portions of the time quantized signals corresponding to selected phonemes and phoneme groups according to their ability to blend naturally with any other phoneme, the selected phonemes and phoneme groups including voiced and unvoiced fricatives, voiced and unvoiced stop consonants, and nasal consonants,(f) delta-modulating the time quantized signals, and(g) Mozer phase-adjusting a selected periodic waveform by Fourier transforming the time quantized signals to generate a set of discrete amplitudes and phase angles, adjusting these phase angles so that the inverse Fourier transformation of the amplitudes and new phases is symmetric, inverse Fourier transforming the phase adjusted amplitudes and phases, storing one-half of a selected waveform as representative of each discrete set of phase adjusted amplitudes and phases and discarding the other half of the selected waveform.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for analyzing and synthesizing speech information in which a predetermined vocabulary is spoken into a microphone, the resulting electrical signals are differentiated with respect to time, digitized, and the digitized waveform is appropriately expanded or contracted by linear interpolation so that the pitch periods of all such waveforms have a uniform number of digitizations and the amplitudes are normalized with respect to a reference signal. These "standardized" speech information digital signals are then compressed in the computer by subjectively removing and discarding redundant speech information such as redundant pitch periods, portions of pitch periods, redundant phonemes and portions of phonemes, redundant amplitude information (delta modulation) and phase informaton (Fourier transformation). The compression techniques are selectively applied to certain of the speech information signals by listening to the reproduced, compressed information. The resulting compressed digital information and associated compression instruction signals produced in the computer are thereafter injected into the digital memories of a digital speech synthesizer where they can be selectively retrieved and audibly reproduced to recreate the original vocabulary words and sentences from them.
99 Citations
100 Claims
-
1. A method of analyzing speech information comprising the steps of time quantizing the amplitude of electrical signals representative of selected speech information into digital form, selectively compressing the time quantized signals by discarding selected portions thereof while substantially simultaneously generating instruction signals as to which portions have been discarded, and storing both the compressed signals and instruction signals, wherein said method further includes:
-
(a) time differentiating the electrical signals prior to the time quantizing step and the signal compressing and storing steps include the steps, (b) selecting signals representative of certain phoneme and phoneme groups from the time quantized signals and replacing portions of these selected signals corresponding to parts of the pitch periods of the certain phonemes and phoneme groups by a constant amplitude signal while generating instruction signals as to which phonemes and phoneme groups have been so selected, (c) selecting signals representative of certain phonemes and phoneme groups from the time quantized signals and storing only portions of these selected time quantized signals corresponding to every nth pitch period of the waveform of the original speech information electrical signal, and storing instruction signals as to which phonemes and phoneme groups have been so selected and storing instruction signals as to the values of n, (d) separating and storing the time quantized signals representative of spoken words into two or more parts, with such parts of later words that are identical to parts of earlier words being deleted from storage while instructions signals as to which parts are deleted are stored, (e) storing portions of the time quantized signals corresponding to selected phonemes and phoneme groups according to their ability to blend naturally with any other phoneme, the selected phonemes and phoneme groups including voiced and unvoiced fricatives, voiced and unvoiced stop consonants, and nasal consonants, (f) delta-modulating the time quantized signals, and (g) Mozer phase-adjusting a selected periodic waveform by Fourier transforming the time quantized signals to generate a set of discrete amplitudes and phase angles, adjusting these phase angles so that the inverse Fourier transformation of the amplitudes and new phases is symmetric, inverse Fourier transforming the phase adjusted amplitudes and phases, storing one-half of a selected waveform as representative of each discrete set of phase adjusted amplitudes and phases and discarding the other half of the selected waveform. - View Dependent Claims (2, 3, 4, 5, 6, 7, 11, 13)
-
-
8. A method of analyzing speech comprising the steps of generating electrical signals representative of the spoken vocabulary words and portions of spoken vocabulary words of a predetermined finite vocabulary with the vocabulary words being included into units containing a plurality of phonemes or phoneme groups, time quantizing the amplitude of the electrical signals into digital form, selectively compressing the time quantized signals by discarding selected portions of them while substantially simultaneously generating instruction signals as to which portions have been discarded, and storing selected portions of the digital signals representative of phonemes and phoneme groups in a first, addressable memory, storing the instruction signals in a second, addressable memory including instruction signals as to the sequence of addresses of the stored phonemes and phoneme groups necessary to reproduce words and sentences of the vocabulary, wherein the signal compressing and storing steps include the following steps:
-
(a) selecting signals representative of certain phonemes and phoneme groups from the time quantized signals and replacing portions of these selected signals corresponding to parts of the pitch periods of the certain phonemes and phoneme groups by a constant amplitude signal while generating instruction signals as to which phonemes and phoneme groups have been so selected, and (b) Fourier transforming the time quantized signals to generate a set of discrete amplitudes and phase angles, adjusting the phase angles so that the inverse Fourier transformation of the amplitudes and new phases is symmetric, inverse Fourier transforming the phase adjusted amplitudes and phases, storing one-half of a selected waveform as representative of each discrete set of phase adjusted amplitudes and phases and discarding the other half of the selected waveform. - View Dependent Claims (9, 10, 12)
-
-
14. An improved speech synthesizer of the type having first addressable memory means for storing digital signal representations of analog electrical signals which represent portions of spoken words of a predetermined vocabulary, second addressable memory means for storing first instruction signals as to the addresses in the first memory means of signals representing portions of the vocabulary words, third addressable memory means for storing second instruction signals as to the addresses in the second memory means of the sequences of the first instruction signals necessary to form selected words of the vocabulary, reproduction means responsive to a digital signal output from the first memory means for reproducing these digital signals in audible form, and control logic means wherein the improvement comprises:
- the first addressable memory means stores digital signal representations of the spoken vocabulary words after having been reduced by predetermined compression techniques and the second addressable memory means further stores compression instruction signals for controlling the operation of the control logic means, the compression instruction signals corresponding to the predetermined compression techniques used to reduce the digital signal representations stored in the first addressable memory means, the control logic means being responsive to the compression instruction signals and modifying the output of first memory means in accordance with the compression instruction signals, and wherein the digital signal representations stored in the first addressable memory means and the corresponding compression instruction signals stored in the second addressable memory means are derived from the following predetermined compression techniques;
(a) the digital signals stored in the first addressable memory means are the time quantization of the derivative with respect to time of analog electrical signals representing the phonemes and phoneme groups which are the constituents of the predetermined vocabulary, (b) the digital signals stored in the first addressable memory means are only selected portions of the digital signals representative of the spoken vocabulary words, with the portions being selected over a repetition period equal to the pitch period of the voiced speech to be synthesized and only those digital signals corresponding to every nth pitch being stored, and the compression instruction signals stored in the second memory means include instruction signals to the control logic means as to the number of times, n, that each such selected portion of data is to be repeatedly extracted from the first addressable memory means before a different signal portion is to be extracted, (c) the compression instruction signals stored by the second addressable memory means include instructions as to the addresses in the first adddressable memory means of digital signals corresponding to phonemes and phoneme groups which naturally blend with any other phoneme and phoneme group, including voiced and unvoiced fricatives, voiced and unvoiced stop consonants, and nasal consonants, (d) selected ones of the digital signals are representative of a predetermined fraction x of the latter part of the analog electrical signal within each pitch period of the spoken word, the compression instruction signals stored in the second memory means including x-period zeroing instruction signals as to the addresses of the selected ones of the digital signals in the first memory means and the control logic means includes means responsive to the x-period zeroing instruction signals for supplying to the reproduction means constant amplitude signals having durations equal to the remaining portions of the waveforms of the voiced phonemes and phoneme groups which are constituents of the predetermined vocabulary, (e) the digital signals are representative of the amplitude of the analog electrical signal over a regular, sampling time interval, the digital signals further being delta modulated by setting the value of the ith digitization of the sampled analog signal equal to the value of the (i-1) the digitization of the sampled analog signal plus f(Δ
i-1, Δ
i) where f(Δ
i-1, Δ
i) is an arbitrary function having the property that changes of waveform of than two levels from one digitization to the next are reproduced exactly while greater changes in either direction are accommodated by slewing in either direction by three levels per digitization,(f) the stored digital signals representative of spoken words are separated into two or more parts, and (g) the stored digital signals represent only one symmetric half of one selected waveform obtained by mozer phase adjusting the waveform by Fourier transforming the digital signals to generate a set of discrete amplitudes and phase angles, adjusting the phase angles so that the inverse Fourier transform waveforms are symmetric, and selecting the one waveform as representative of the set of symmetric waveforms, said control logic means including means responsive to receipt of instruction signals specifying digital signals stored in said first addressable memory means as Mozer phase adjusted signals for causing said reproduction means to expand said Mozer phase adjusted signals in audible form. - View Dependent Claims (15, 16)
- the first addressable memory means stores digital signal representations of the spoken vocabulary words after having been reduced by predetermined compression techniques and the second addressable memory means further stores compression instruction signals for controlling the operation of the control logic means, the compression instruction signals corresponding to the predetermined compression techniques used to reduce the digital signal representations stored in the first addressable memory means, the control logic means being responsive to the compression instruction signals and modifying the output of first memory means in accordance with the compression instruction signals, and wherein the digital signal representations stored in the first addressable memory means and the corresponding compression instruction signals stored in the second addressable memory means are derived from the following predetermined compression techniques;
-
17. An improved speech synthesizer of the type having first addressable memory means for storing digital signal representations of analog electrical signals which represent portions of spoken words of a predetermined vocabulary, second addressable memory means for storing first instruction signals as to the addresses in the first memory means of signals representing portions of the vocabulary words, third addressable memory means for storing second instruction signals as to the addresses in the second memory means of the sequences of the first instruction signals necessary to form selected words of the vocabulary, reproduction means responsive to a digital signal output from the first memory means for reproducing these digital signals in audible form, and control logic means for selectively, sequentially extracting the second instruction signals from the third memory means and using these extracted second instruction signals for sequentially extracting selected first instruction signals from the second memory means, and using these extracted first instruction signals to sequentially extract selected digital signals from the first memory means to audibly reproduce selected words of the vocabulary through the reproduction means, wherein the improvement comprises:
the first addressable memory means stores digital signal representations of the spoken vocabulary words after having been reduced by predetermined compression techniques and the second addressable memory means further stores compression instruction signals for controlling the operation of the control logic means, the compression instruction signals corresponding to the predetermined compression techniques used to reduce the digital signal representations stored in the first addressable memory means, the control logic means being responsive to the compression instruction signals and modifying the output of first memory means in accordance with the compression instruction signals, and wherein the digital signal representations stored in the first addressable memory means and the corresponding compression instruction signals stored in the second addressable memory means are derived from the following predetermined compression techniques; (a) selected ones of the digital signals are representative of a predetermined fraction x of the latter part of the analog electrical signal within each pitch period of the spoken word, the compression instruction signals stored in the second memory means including x-period zeroing instruction signals as to the addresses of the selected ones of the digital signals in the first memory means and the control logic means includes means responsive to the x-period zeroing instruction signals for supplying to the reproduction means constant amplitude signals having durations equal to the remaining portions of the waveforms of the voiced phonemes and phoneme groups which are constituents of the predetermined vocabulary, and (b) the stored digital signals represent only one symmetric half of one selected waveform obtained by Fourier transforming the digital signals to generate a set of discrete amplitudes and phase angles, adjusting the phase angles so that on the inverse Fourier transform waveforms are symmetric, and selecting the one waveform as representative of the set of symmetric waveforms. - View Dependent Claims (18, 19)
-
20. A speech synthesizer comprising
first addressable memory means for storing digital signal representations of electrical signals which represent portions of spoken words of a predetermined vocabulary, all of the digital signals stored in the first memory means being the delta modulated, time quantization of the derivative with respect to time of analog electrical signals representing the phonemes and phoneme groups which are the constituents of the predetermined vocabulary, and the stored digital signals further representing only one symmetric half of one selected waveform obtained by Fourier transforming the delta modulated, time quantized derivative of the analog signals to generate a set of discrete amplitudes and phase angles, adjusting the phase angles so that on inverse Fourier transformation the waveforms are symmetric, and selecting the one waveform as representative of the set of symmetric waveforms, second addressable memory means for storing first instruction signals as to the addresses in the first addressable memory means of signals representing portions of the vocabulary words, third addressable memory means for storing second instruction signals as to the addresses in the second memory means of the sequences of the first instruction signals necessary to form selected words of the vocabulary, reproduction means responsive to the digital signal output of the first memory means for reproducing these digital signals in audible form, and control logic means for selectively, sequentially extracting the second instruction signals from the third memory means and using these extracted second instruction signals for sequentially extracting selected first instruction signals from the second memory means, and using these extracted first instruction signals to sequentially extract selected digital signals from the first memory means to audibly reproduce selected words of the vocabulary through the reproduction means.
- 22. A method of compressing information bearing signals such as speech to reduce the information content thereof without destroying the intelligibility thereof, said method comprising the steps of mozer phase adjusting said signals to produce equivalent signals having symmetric portions, and deleting selected redundant portions of said equivalent signals.
-
35. A method of synthesizing signals from information signals previously compressed by the technique of phase adjusting original signals to produce equivalent signals having symmetric portions, deleting selected fractional portions of said symmetric portions of said equivalent signals and generating instruction signals identifying the selected fractional portions so deleted, and from said instruction signals, said method comprising the steps of:
-
(a) reproducing said compressed information signals; (b) expanding the reproduced signals to supply said fractional portions in accordance with said instruction signals; and (c) converting the expanded reproduced signals to audible form. - View Dependent Claims (36, 37, 38, 39, 40)
-
-
41. A system for compressing information bearing input signals such as speech to reduce the information content thereof without destroying the intelligibility thereof, said system comprising:
-
input means adapted to receive said input signals; means for Mozer phase adjusting said signals to produce equivalent signals having symmetric portions; and means for deleting selected redundant portions of said equivalent signals. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50)
-
-
51. A system for synthesizing signals from compressed information signals having the form of an inverse transformation of a partially symmetric phase adjusted transform of the original signals, said compressed information signals being devoid of selected portions corresponding to a fraction of the partially symmetric portions of said phase adjusted transform, and instruction signals identifying the selected portions, said system comprising:
-
means for reproducing said compressed information signals; means coupled to said reproducing means for expanding the reproduced signals to supply said fractional portions in accordance with said instruction signals; and means for converting the expanded reproduced signals to audible form. - View Dependent Claims (52, 53, 54, 55, 56)
-
-
57. A method of processing information bearing signals to initially reduce the information content thereof without destroying the intelligibility of the information contained therein and to synthesize signals from the processed signals, said method comprising the steps of:
-
(a) Mozer phase adjusting said information bearing signals to produce equivalent signals having substantially symmetric portions; (b) deleting selected redundant portions of said equivalent signals; (c) X period zeroing said information bearing signals by deleting preselected relatively low power portions of the signals resulting from steps (a) and (b); (d) generating instruction signals specifying those portions of said signals deleted in steps (b) and (c); (e) reproducing the signals resulting from said steps of (a) Mozer phase adjusting, (b) deleting and (c) X period zeroing; (f) expanding said reproduced signals to supply said deleted redundant portions in accordance with said instruction signals; (g) inserting substantially constant amplitude signals between the non-deleted portions of the signals resulting from step (f) in accordance with said instruction signals so that said deleted relatively low power signal portions are replaced by said signals of substantially constant amplitude; and (h) converting the signals resulting from step (g) to perceivable form. - View Dependent Claims (58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69)
-
-
70. A system for processing information bearing input signals to initially compress said input signals by reducing the information content thereof without destroying the intelligibility thereof and subsequently synthesizing signals from said compressed signals, said system comprising:
-
input means adapted to receive said input signals; means coupled to said input means for Mozer phase adjusting said input signals to produce equivalent signals having substantially symmetric portions; means for deleting selected redundant portions of said equivalent signals; means for X period zeroing the signals processed by said Mozer phase adjusting means and said deleting means by deleting preselected relatively low power portions of the processed signals; means for generating instruction signals specifying those portions of said input signals deleted by said deleting means and said X period zeroing means; means for reproducing the signals processed by said X period zeroing means; means for expanding the reproduced signals to supply said deleted redundant portions in accordance with said instruction signals; means for inserting substantially constant amplitude signals between the non-deleted portions of the signals generated by said expanding means in accordance with said instruction signals so that said deleted relatively low power signal portions are replaced by said signals of substantially constant amplitude; and means for converting the signals output from said inserting means to perceivable form. - View Dependent Claims (71, 72, 73, 74, 75, 76, 77, 78, 79, 80)
-
-
81. In a synthesizer of original information bearing time domain signals from compressed information time domain signals produced by predetermined different signal compression techniques, said compressed information time domain signals comprising an inverse transformation of a Mozer phase adjusted transform of said original time domain signals, a memory device comprising:
means for storing said compressed information time domain signals and instruction signals specifying the particular compression technique applied to said original information bearing time domain signals to produce corresponding portions of said compressed information time domain signals, said compressed information time domain signals comprising a plurality of samples resulting from said predetermined signal compression techniques, the number of said different signal compression techniques applied to said original signal being greater than 2, the ratio of said plurality of samples to the minimum number of samples required to uniquely and intelligibly identify said original information bearing signals being no greater than about 0.2, and means for expanding said compressed signals comprising said inverse transform. - View Dependent Claims (82, 83, 84, 85, 86, 87, 88, 89)
-
90. A method of compressing information bearing signals comprising the steps of:
-
(a) phase adjusting said information bearing signals to produce equivalent signals having substantially symmetric portions; (b) deleting selected redundant portions of said equivalent signals; and (c) processing said equivalent signals by the additional signal compression technique of X period zeroing said information bearing signals. - View Dependent Claims (91, 92, 93, 94, 95)
-
-
97. For use with a memory element containing compressed information time domain signals produced by predetermined signal compression techniques and instruction signals specifying the particular compression techniques applied to original information bearing time domain signals to produce corresponding portions of said compressed information time domain signals, said predetermined signal compression techniques including Mozer phase adjusting of said original information bearing time domain signals, a controller device for synthesizing said original information bearing time domain signals, said controller device comprising:
-
controller storage means having an input adapted to be coupled to said memory element for sequentially receiving ordered ones of said compressed information time domain signals; means adapted to be coupled to said controller storage means for generating control signals enabling said ordered ones of said compressed information time domain signals to be coupled to said controller storage means, said control signal generator means including means for receiving corresponding ones of said instruction signals identifying the type of compression technique applied to said ordered ones of said compressed information time domain signals associated with said control signals; converter means coupled to said controller storage means for converting said ordered ones of said compressed information time domain signals to synthetic analog signals corresponding to said original information bearing time domain signals; and means responsive to receipt of a Mozer phase adjust instruction signal from said memory element for causing compressed information time domain signals stored in said controller storage means to be sequentially coupled to said converter means in a first ordered manner and subsequently causing the same signals stored in said controller storage means to be sequentially coupled to said converter means in a reverse manner from said first ordered manner. - View Dependent Claims (98, 99, 100)
-
Specification