Generating speech from digitally stored coarticulated speech segments

US 5,153,913 A
Filed: 06/19/1989
Issued: 10/06/1992
Est. Priority Date: 10/09/1987
Status: Expired due to Term

First Claim

Patent Images

1. A method of generating speech using prerecorded real speech diphones, said method comprising the steps of:

digitally recording with a bandwidth of at least 3 KHz spoken carrier syllables in which desired diphone sounds are embedded;

extracting digital data samples representing beginning, ending, and intermediate diphone sounds from the digitally recorded at least 3 KHz carrier syllables at a substantially common preselected location in the waveform of each diphone;

storing data samples representing said extracted digital diphone sounds in a digital memory device;

generating a selected text to speech sequence of diphones required to generate a desired message;

recovering stored data from said digital memory device for each diphone in said selected sequence of diphones;

concatenating said selected sequence of diphones directly without any interpolation signals, in real time, using the recovered data; and

applying the concatenated diphone data to sound generating means to generate a desired message with at least a 3 KHz bandwidth.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Coarticulated speech segment data are extracted from spoken carrier syllables and digitally compressed for storage using adaptive differential pulse code modulation (ADPCM). Beginning seed quantization and PCM values are generated for each coarticulated speech segment and stored together with the ADPCM encoded data in a coarticulated speech segment library. ADPCM encoded data are recovered from the coarticulated speech segment library and blown back using the initial quantization and PCM seed values to reconstruct and concatenate in real time the sequence of coarticulated speech segments required by a text to speech program to generate a desired high quality spoken message. In the preferred embodiment of the invention, the coarticulated speech segments are diphones.

181 Citations

23 Claims

1. A method of generating speech using prerecorded real speech diphones, said method comprising the steps of:
- digitally recording with a bandwidth of at least 3 KHz spoken carrier syllables in which desired diphone sounds are embedded;
  
  extracting digital data samples representing beginning, ending, and intermediate diphone sounds from the digitally recorded at least 3 KHz carrier syllables at a substantially common preselected location in the waveform of each diphone;
  
  storing data samples representing said extracted digital diphone sounds in a digital memory device;
  
  generating a selected text to speech sequence of diphones required to generate a desired message;
  
  recovering stored data from said digital memory device for each diphone in said selected sequence of diphones;
  
  concatenating said selected sequence of diphones directly without any interpolation signals, in real time, using the recovered data; and
  
  applying the concatenated diphone data to sound generating means to generate a desired message with at least a 3 KHz bandwidth.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 including time domain compressing the data samples representing said extracted digital diphone sounds prior to storage in said digital:
    - a memory device, and wherein recovering said stored data includes reconstructing the diphone data from said time domain compressed data.
  - 3. The method of claim 2 wherein said step of time domain compressing said diphone data includes generating a quantizer for each compressed data sample, wherein storing includes storing a seed quantizer for each diphone, and wherein reconstructing includes generating a quantizer for each compressed data sample from the quantizer for the preceding data sample beginning with said seed quantizer.
  - 4. The method of claim 3 wherein storing includes storing uncompressed digital data for the first data sample in each diphone as a seed value for the diphone data, and wherein reconstructing includes using said diphone data seed value as the value for the first data sample in a reconstructed diphone and using the seed quantizer and stored compressed data for the second data sample to generate the reconstructed data value of the second data sample.
  - 5. The method of claim 4 wherein said time domain compressing comprises adaptive differential pulse code modulation.
  - 6. The method of claim 5 wherein generating said seed quantizer for the data samples for said diphones includes a) assuming a quantizer for the first data sample, b) time domain compressing a selected number of data samples, c) reconstructing the data samples from the compressed data, d) comparing the reconstructed compressed data with the original data, e) iteratively adjusting the value of the assumed quantizer and repeating steps b through d, and f) selecting as the seed quantizer the assumed value thereof which satisfies selected criteria of said comparison step.
  - 7. The method of claim 6 wherein said comparison includes generating an absolute value of the difference between the reconstructed and original values of the diphone data for each data sample and summing said absolute values to generate a total error, and wherein the step of selecting comprises selecting as the seed quantizer the assumed quantizer value which produces the minimum total error.
  - 8. The method of claim 1 wherein said diphones are extracted from the recorded carrier syllables substantially at the digital data sample closest to a zero crossing with each waveform traveling in the same direction.
  - 9. The method of claim 8 wherein said diphone sounds are digitally recorded at a bandwidth of about 4 KHz.

10. A method of time domain compression of pulse code modulated (PCM) data samples of beginning, ending and intermediate coarticulated speech segments extracted from digitally recorded carrier syllables comprising the steps of:
- assuming a quantizer for the first data sample;
  
  time domain compressing the PCM data for each of a selected number of data samples in succession as a function of a quantizer generated from the quantizer for the preceding sample starting with the assumed value of the quantizer for the first data sample;
  
  reconstructing said PCM data from said compressed data for each of said selected number of data samples as a function of a quantizer generated from the quantizer for the preceding sample starting with the assumed value of the quantizer for the first data sample;
  
  comparing the reconstructed data with said PCM data for said selected data samples;
  
  iteratively repeating the above steps for selected different assumed values of said quantizer for the first data sample;
  
  selecting as the final value of said quantizer for the first data sample the value which generates a predetermined comparison between the reconstructed data and the PCM data;
  
  storing said final value of said quantizer for the first data sample; and
  
  time domain compressing PCM data for all data points in said coarticulated speech segment as a function of a quantizer generated from the quantizer for the preceding data sample beginning with the final assumed value of said quantizer for the first data sample.
- View Dependent Claims (11, 12)
- - 11. The method of claim 10 wherein said step of comparing reconstructed data with the PCM data comprises generating an absolute value of the difference between the reconstructed data and PCM data for each data sample and summing said absolute values to generate a total error, and wherein the step of selecting the final value of the quantizer for the first data sample comprises selecting the assumed quantizer which produces the minimum total error.
  - 12. The method of claim 11 wherein adaptive differential pulse code modulation is used for time domain compressing said PCM data.

13. A method of generating speech using prerecorded real speech coarticulated speech segments, said method comprising the steps of:
- digitally recording as PCM (pulse code modulated) data samples spoken carrier syllables in which desired coarticulated speech segment sounds are embedded;
  
  extracting the PCM data samples representing desired beginning, ending and intermediate coarticulated segment sounds from the digitally recorded carrier syllables at a substantially common preselected location in the waveform of each coarticulated speech segment;
  
  digitally compressing the PCM data samples of said coarticulated speech segments using adaptive differential pulse code modulation (ADPCM) to generate ADPCM encoded data;
  
  storing the ADPCM compressed data representing said extracted digital coarticulated speech segment sounds in a digital memory device;
  
  generating a selected text to speech sequence of coarticulated speech segments required to generate a desired message;
  
  recovering stored ADPCM encoded data from said digital memory device for each coarticulated speech segment in said selected sequence of coarticulated speech segments;
  
  reconstructing the PCM coarticulated speech segment data samples from said recovered ADPCM encoded data;
  
  concatenating said reconstructed PCM coarticulated speech segment data samples in said selected text to speech sequence of coarticulated speech segments directly without any interpolation signals, in real time;
  
  and applying the concatenated reconstructed coarticulated speech segment data samples to sound generating means to generate said desired message.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13 wherein compressing the PCM data samples includes generating a seed quantizer for the first data sample in each coarticulated speech segment, wherein storing includes storing said seed quantizer for the first data sample, and wherein reconstructing the coarticulated speech segment data samples includes using the stored seed quantizer to initiate reconstruction of the PCM coarticulated speech segment data samples from the ADPCM encoded data.
  - 15. The method of claim 14 wherein said storing includes storing the PCM value for the first data sample for each coarticulated speech segment as the PCM seed value together with the seed quantizer and the ADPCM encoded data, and wherein reconstructing said PCM data comprises using the stored PCM seed value as the reconstructed PCM value for the first data sample and generating the reconstructed PCM value of the second data sample as a function of the PCM seed value, the seed quantizer and the stored ADPCM encoded data for the second sample.
  - 16. The method of claim 15 wherein said seed quantizer for the first data point in each coarticulated speech segment is iteratively determined as an assumed value which best matches the reconstructed data for a selected number of samples in the coarticulated speech segment with the PCM data for those selected samples.
  - 17. The method of claim 16 wherein said beginning, ending and intermediate coarticulated speech segment sounds are extracted from said carrier syllables substantially at the PCM data point closest to a zero crossing of each waveform, with each waveform traveling in the same direction.
  - 18. The method of claim 17 wherein said carrier syllables are digitally recorded with a bandwidth of at least 3 KHz.

19. Apparatus for generating speech from pulse code modulated (PCM) data samples of coarticulated speech segments extracted from the beginning, middle and end of carrier syllables digitally recorded with a bandwidth of at least 3 KHz, said apparatus comprising:
- means for digitally compressing the PCM data samples, including means for adaptive differential pulse code modulation (ADPCM) encoding said PCM data samples and for generating a quantizer for the first data sample of each coarticulated speech segment;
  
  means for storing the digitally compressed data samples, including means for storing as seed values said quantizer and said PCM data for the first data sample in each coarticulated speech segment;
  
  means for generating a selected text to speech sequence of coarticulated speech segments required to generate a desired message;
  
  means responsive to said means for generating said selected text to speech sequence of coarticulated speech segments for recovering the stored digitally compressed data samples for each coarticulated speech segment in said selected sequence of coarticulated speech segments, including means for recovering said seed quantizer and said seed PCM data;
  
  means for reconstructing PCM data from said recovered compressed data in said selected sequence, including means for using said seed PCM value as the reconstructed PCM data for the first data sample and for generating the reconstructed PCM value of the second data sample as a function of the reconstructed PCM data for the first data sample, said seed quantizer, and the stored ADPCM data for the second data sample; and
  
  means responsive to said sequence of reconstructed PCM data for generating an acoustic wave containing said desired message.

20. A system for generating speech using prerecorded real speech diphones;
- said system comprising;
  
  means for digitally recording with a bandwidth of at least 3 KHz spoken carrier syllables in which desired diphone sounds are embedded;
  
  means for extracting digital data samples representing beginning, ending, and intermediate diphone sounds from the digitally recorded at least 3 KHz carrier syllables at a substantially common preselected location in the waveform of each diphone;
  
  means for storing data samples representing said extracted digital diphone sounds;
  
  means for generating a selected text to speech sequence of diphones required to generate a desired message;
  
  means responsive to the means for generating said text to speech sequence of diphones for recovering from said storing means stored data for each diphone in said selected sequence of diphones;
  
  means for concatenating said selected sequence of diphones directly without any interpolation signals, in real time, using the recovered data; and
  
  sound generating means responsive to said concatenated diphones to generate acoustic waves with at least a 3 KHz bandwidth containing said desired message.
- View Dependent Claims (21, 22, 23)
- - 21. The system of claim 20 including means for time domain compressing the data samples representing said extracted digital diphone sounds for storage in said storage means, and wherein said means for recovering said storage data include means for reconstructing the diphone data from said time domain compressed data.
  - 22. The system of claim 21 wherein said means for time domain compressing data samples comprises means for adaptive differential pulse code modulation (ADPCM) encoding of such data samples and include means for generating a seed quantizer for the first data sample in each diphone, wherein said storing means includes means for storing said seed quantizer, and wherein said means for reconstructing said PCM data included means for utilizing said seed quantizer to reconstruct the first ADPCM encoded sample.
  - 23. The system of claim 22 wherein said means for generating said seed quantizer includes means for assuming a value for said seed quantizer, means for ADPCM encoding a selected number of data samples starting with said assumed seed quantizer value, means for reconstructing the selected number of data samples from the compressed data beginning with the assumed quantizer value, means for comparing the reconstructed compressed data with the PCM data, means for iteratively adjusting the assumed value of the seed quantizer and means for selecting as the seed quantizer the assumed value thereof which satisfies selected criteria of said comparison means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sound Entertainment Incorporated
Original Assignee
Sound Entertainment Incorporated
Inventors
Mosenfelder, James R., Kandefer, Edward M.
Primary Examiner(s)
Shaw, Dale M.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/382,675
Time in Patent Office

1,205 Days
Field of Search

381/51-53, 381/35, 381/36, 381/37-40, 364/513.5
US Class Current

704/260
CPC Class Codes

G10L 13/07 Concatenation rules

Generating speech from digitally stored coarticulated speech segments

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

181 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Generating speech from digitally stored coarticulated speech segments

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

181 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links