Generating speech from digitally stored coarticulated speech segments
First Claim
1. A method of generating speech using prerecorded real speech diphones, said method comprising the steps of:
- digitally recording with a bandwidth of at least 3 KHz spoken carrier syllables in which desired diphone sounds are embedded;
extracting digital data samples representing beginning, ending, and intermediate diphone sounds from the digitally recorded at least 3 KHz carrier syllables at a substantially common preselected location in the waveform of each diphone;
storing data samples representing said extracted digital diphone sounds in a digital memory device;
generating a selected text to speech sequence of diphones required to generate a desired message;
recovering stored data from said digital memory device for each diphone in said selected sequence of diphones;
concatenating said selected sequence of diphones directly without any interpolation signals, in real time, using the recovered data; and
applying the concatenated diphone data to sound generating means to generate a desired message with at least a 3 KHz bandwidth.
1 Assignment
0 Petitions
Accused Products
Abstract
Coarticulated speech segment data are extracted from spoken carrier syllables and digitally compressed for storage using adaptive differential pulse code modulation (ADPCM). Beginning seed quantization and PCM values are generated for each coarticulated speech segment and stored together with the ADPCM encoded data in a coarticulated speech segment library. ADPCM encoded data are recovered from the coarticulated speech segment library and blown back using the initial quantization and PCM seed values to reconstruct and concatenate in real time the sequence of coarticulated speech segments required by a text to speech program to generate a desired high quality spoken message. In the preferred embodiment of the invention, the coarticulated speech segments are diphones.
181 Citations
23 Claims
-
1. A method of generating speech using prerecorded real speech diphones, said method comprising the steps of:
-
digitally recording with a bandwidth of at least 3 KHz spoken carrier syllables in which desired diphone sounds are embedded; extracting digital data samples representing beginning, ending, and intermediate diphone sounds from the digitally recorded at least 3 KHz carrier syllables at a substantially common preselected location in the waveform of each diphone; storing data samples representing said extracted digital diphone sounds in a digital memory device; generating a selected text to speech sequence of diphones required to generate a desired message; recovering stored data from said digital memory device for each diphone in said selected sequence of diphones; concatenating said selected sequence of diphones directly without any interpolation signals, in real time, using the recovered data; and applying the concatenated diphone data to sound generating means to generate a desired message with at least a 3 KHz bandwidth. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of time domain compression of pulse code modulated (PCM) data samples of beginning, ending and intermediate coarticulated speech segments extracted from digitally recorded carrier syllables comprising the steps of:
-
assuming a quantizer for the first data sample;
time domain compressing the PCM data for each of a selected number of data samples in succession as a function of a quantizer generated from the quantizer for the preceding sample starting with the assumed value of the quantizer for the first data sample;reconstructing said PCM data from said compressed data for each of said selected number of data samples as a function of a quantizer generated from the quantizer for the preceding sample starting with the assumed value of the quantizer for the first data sample; comparing the reconstructed data with said PCM data for said selected data samples; iteratively repeating the above steps for selected different assumed values of said quantizer for the first data sample; selecting as the final value of said quantizer for the first data sample the value which generates a predetermined comparison between the reconstructed data and the PCM data; storing said final value of said quantizer for the first data sample; and time domain compressing PCM data for all data points in said coarticulated speech segment as a function of a quantizer generated from the quantizer for the preceding data sample beginning with the final assumed value of said quantizer for the first data sample. - View Dependent Claims (11, 12)
-
-
13. A method of generating speech using prerecorded real speech coarticulated speech segments, said method comprising the steps of:
-
digitally recording as PCM (pulse code modulated) data samples spoken carrier syllables in which desired coarticulated speech segment sounds are embedded; extracting the PCM data samples representing desired beginning, ending and intermediate coarticulated segment sounds from the digitally recorded carrier syllables at a substantially common preselected location in the waveform of each coarticulated speech segment; digitally compressing the PCM data samples of said coarticulated speech segments using adaptive differential pulse code modulation (ADPCM) to generate ADPCM encoded data; storing the ADPCM compressed data representing said extracted digital coarticulated speech segment sounds in a digital memory device; generating a selected text to speech sequence of coarticulated speech segments required to generate a desired message; recovering stored ADPCM encoded data from said digital memory device for each coarticulated speech segment in said selected sequence of coarticulated speech segments; reconstructing the PCM coarticulated speech segment data samples from said recovered ADPCM encoded data; concatenating said reconstructed PCM coarticulated speech segment data samples in said selected text to speech sequence of coarticulated speech segments directly without any interpolation signals, in real time; and applying the concatenated reconstructed coarticulated speech segment data samples to sound generating means to generate said desired message. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. Apparatus for generating speech from pulse code modulated (PCM) data samples of coarticulated speech segments extracted from the beginning, middle and end of carrier syllables digitally recorded with a bandwidth of at least 3 KHz, said apparatus comprising:
-
means for digitally compressing the PCM data samples, including means for adaptive differential pulse code modulation (ADPCM) encoding said PCM data samples and for generating a quantizer for the first data sample of each coarticulated speech segment; means for storing the digitally compressed data samples, including means for storing as seed values said quantizer and said PCM data for the first data sample in each coarticulated speech segment; means for generating a selected text to speech sequence of coarticulated speech segments required to generate a desired message; means responsive to said means for generating said selected text to speech sequence of coarticulated speech segments for recovering the stored digitally compressed data samples for each coarticulated speech segment in said selected sequence of coarticulated speech segments, including means for recovering said seed quantizer and said seed PCM data; means for reconstructing PCM data from said recovered compressed data in said selected sequence, including means for using said seed PCM value as the reconstructed PCM data for the first data sample and for generating the reconstructed PCM value of the second data sample as a function of the reconstructed PCM data for the first data sample, said seed quantizer, and the stored ADPCM data for the second data sample; and means responsive to said sequence of reconstructed PCM data for generating an acoustic wave containing said desired message.
-
-
20. A system for generating speech using prerecorded real speech diphones;
- said system comprising;
means for digitally recording with a bandwidth of at least 3 KHz spoken carrier syllables in which desired diphone sounds are embedded; means for extracting digital data samples representing beginning, ending, and intermediate diphone sounds from the digitally recorded at least 3 KHz carrier syllables at a substantially common preselected location in the waveform of each diphone; means for storing data samples representing said extracted digital diphone sounds; means for generating a selected text to speech sequence of diphones required to generate a desired message; means responsive to the means for generating said text to speech sequence of diphones for recovering from said storing means stored data for each diphone in said selected sequence of diphones; means for concatenating said selected sequence of diphones directly without any interpolation signals, in real time, using the recovered data; and sound generating means responsive to said concatenated diphones to generate acoustic waves with at least a 3 KHz bandwidth containing said desired message. - View Dependent Claims (21, 22, 23)
- said system comprising;
Specification