Compression of stored waveforms for artificial speech
First Claim
1. In a method of digitally converting text to speech, said method including the step of encoding waveforms representing sounds in the form of digital indices representing addresses in a table containing selected values of Δ
- , and the successive samples of the waveform are computed by the formula
space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +Δ
where St is the sample being computed, St-1 is the next preceding sample, St-2 is the second preceding sample, a and b are constants, and Δ
is the value stored in said table at the address defined by the index corresponding to St, the improvement comprising encoding said indices by a Huffman coding in which the shortest codes of said Huffman coding represent the addresses of the Δ
values occurring most frequently in the computation of said waveform.
3 Assignments
0 Petitions
Accused Products
Abstract
In a digital text-to-speech conversion system of the type usually contained in all-software form on a floppy disk, memory requirements are reduced while speech quality is improved, by providing compression techniques and anti-distortion techniques which interact to provide clear speech at widely varying speeds with a minimum of memory. These techniques include using Huffman coding to advantage by encoding only differences between successive waveforms where feasible, relocating delta tables and using them repetitively, using a demi-diphone organization of the speech to allow use of the same instruction lists for several sounds; and combining selective deletion or repetition of waveforms with selective interpolation to vary speed without slurring.
-
Citations
16 Claims
-
1. In a method of digitally converting text to speech, said method including the step of encoding waveforms representing sounds in the form of digital indices representing addresses in a table containing selected values of Δ
- , and the successive samples of the waveform are computed by the formula
space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +Δwhere St is the sample being computed, St-1 is the next preceding sample, St-2 is the second preceding sample, a and b are constants, and Δ
is the value stored in said table at the address defined by the index corresponding to St, the improvement comprising encoding said indices by a Huffman coding in which the shortest codes of said Huffman coding represent the addresses of the Δ
values occurring most frequently in the computation of said waveform. - View Dependent Claims (2, 3)
- , and the successive samples of the waveform are computed by the formula
-
4. In a real-time text-to-speech conversion system in which waveforms are encoded in the form of digital indices representing addresses in a table containing selected values of Δ
- , and the successive samples of the waveform are computed by the formula
space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +Δwhere St is the sample being computed, St-1 is the next preceding sample, St-2 is the second preceding sample, a and b are constants, and Δ
is the value stored in said table at the address defined by the index corresponding to St, the improvement comprising;(a) encoding successive waveforms so that each St represents the difference between the corresponding waveform sample and the corresponding sample of a preceding waveform; and (b) adding each St to the corresponding sample of said preceding waveform to form the corresponding sample of the waveform being computed. - View Dependent Claims (5)
- , and the successive samples of the waveform are computed by the formula
-
6. In a real-time text-to-speech conversion system in which waveforms are encoded in the form of digital indices representing addresses in a table containing selected values of Δ
- , and the successive samples of the waveform are computed by the formula
space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +ΔSt is the sample being computed, St-1 is the next preceding sample, St-2 is the second preceding sample, a and b are constants, and Δ
is the value stored in said table at the address defined by the index corresponding to St, the improvement comprising providing a single table for a plurality of waveforms, said single table being addressable by the indices of each waveform.
- , and the successive samples of the waveform are computed by the formula
-
7. In a real-time text-to-speech conversion system in which waveforms are encoded in the form of digital indices representing addresses in a table containing selected values of Δ
- , and the successive samples of the waveform are computed by the formula
space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +Δwhere St is the sample being computed, St-1 is the next preceding sample, St-2 is the second preceding sample, a and b are constants, and Δ
is the value stored in said table at the address defined by the index corresponding to St, the improvement comprising encoding, in place of a plurality of different actual waveforms, a single compromise waveform precomputed off-line to contain the fewest possible differences from each of said plurality of waveforms.
- , and the successive samples of the waveform are computed by the formula
- 8. In a method of digitally converting text to speech, said method including the step of producing fricative sounds by concatenating a plurality of segments each containing a plurality of repetitions of a digitally encoded waveform, the improvement comprising the step of increasingly truncating said waveform for each of said repetitions in any given segment.
-
10. In a method of digitally converting text to speech, said method including the step of producing fricative sounds by concatenating a plurality of segments each containing a plurality of repetitions of a digitally encoded waveform, said segments containing different amplitudes of said waveform, the improvement comprising the step of progressively inteprolating the waveform of one of said segments with the waveform of an adjacent segment.
-
11. In a method of digitally converting text to speech, said method including the step of producing speech by concatenating a plurality of segments each containing at least one concantenated repetation of a stored digitally encoded waveform, a method of varying the speed of the speech comprising the steps of:
-
(a) reiteratively counting said concatenated waveform repetitions; (b) deleting or repeating one of said waveform repetitions when said count reaches a selectable number; and (c) varying said number. - View Dependent Claims (12, 13)
-
-
14. In a method of digitally converting text to speech, said method including the step of producing speech by concatenating a plurality of segments each containing at least one concatenated repetition of a stored digitally encoded waveform, and in which the waveforms of predetermined successive segments are interpolated in accordance with the formula ##EQU3## where St out is the output signal for a given sample;
-
St in is the input signal for that sample; St-1 out is the output signal for the previous sample; and k is a non-negative integer, a method of preventing the slurring of formants comprising the step of varying the value of k in accordance with the speed of the speech. - View Dependent Claims (15, 16)
-
Specification