Compression of stored waveforms for artificial speech

US 4,852,168 A
Filed: 11/18/1986
Issued: 07/25/1989
Est. Priority Date: 11/18/1986
Status: Expired due to Fees

First Claim

Patent Images

1. In a method of digitally converting text to speech, said method including the step of encoding waveforms representing sounds in the form of digital indices representing addresses in a table containing selected values of Δ

, and the successive samples of the waveform are computed by the formula
space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +Δ

where S_t is the sample being computed, S_t-1 is the next preceding sample, S_t-2 is the second preceding sample, a and b are constants, and Δ

is the value stored in said table at the address defined by the index corresponding to S_t, the improvement comprising encoding said indices by a Huffman coding in which the shortest codes of said Huffman coding represent the addresses of the Δ

values occurring most frequently in the computation of said waveform.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a digital text-to-speech conversion system of the type usually contained in all-software form on a floppy disk, memory requirements are reduced while speech quality is improved, by providing compression techniques and anti-distortion techniques which interact to provide clear speech at widely varying speeds with a minimum of memory. These techniques include using Huffman coding to advantage by encoding only differences between successive waveforms where feasible, relocating delta tables and using them repetitively, using a demi-diphone organization of the speech to allow use of the same instruction lists for several sounds; and combining selective deletion or repetition of waveforms with selective interpolation to vary speed without slurring.

Citations

16 Claims

1. In a method of digitally converting text to speech, said method including the step of encoding waveforms representing sounds in the form of digital indices representing addresses in a table containing selected values of Δ
- , and the successive samples of the waveform are computed by the formula
  space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +Δ
  where S_t is the sample being computed, S_t-1 is the next preceding sample, S_t-2 is the second preceding sample, a and b are constants, and Δ
  
  is the value stored in said table at the address defined by the index corresponding to S_t, the improvement comprising encoding said indices by a Huffman coding in which the shortest codes of said Huffman coding represent the addresses of the Δ
  
  values occurring most frequently in the computation of said waveform.
- View Dependent Claims (2, 3)
- - 2. The improvement of claim 1, in which said table is stored in system memory immediately following said indices.
  - 3. The improvement of claim 2, in which said table contains less than the maximum number of Δ
    - values that can be addressed by said indices.

4. In a real-time text-to-speech conversion system in which waveforms are encoded in the form of digital indices representing addresses in a table containing selected values of Δ
- , and the successive samples of the waveform are computed by the formula
  space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +Δ
  where S_t is the sample being computed, S_t-1 is the next preceding sample, S_t-2 is the second preceding sample, a and b are constants, and Δ
  
  is the value stored in said table at the address defined by the index corresponding to S_t, the improvement comprising;
  
  (a) encoding successive waveforms so that each S_t represents the difference between the corresponding waveform sample and the corresponding sample of a preceding waveform; and
  
  (b) adding each S_t to the corresponding sample of said preceding waveform to form the corresponding sample of the waveform being computed.
- View Dependent Claims (5)
- - 5. The improvement of claim 4, in which, when said preceding waveform and said waveform being computed have different numbers of samples, the shorter waveform is treated in the computation as if it were padded with sufficient zero value samples to equal the number of samples in the longer waveform.

6. In a real-time text-to-speech conversion system in which waveforms are encoded in the form of digital indices representing addresses in a table containing selected values of Δ
- , and the successive samples of the waveform are computed by the formula
  space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +Δ
  S_t is the sample being computed, S_t-1 is the next preceding sample, S_t-2 is the second preceding sample, a and b are constants, and Δ
  
  is the value stored in said table at the address defined by the index corresponding to S_t, the improvement comprising providing a single table for a plurality of waveforms, said single table being addressable by the indices of each waveform.

7. In a real-time text-to-speech conversion system in which waveforms are encoded in the form of digital indices representing addresses in a table containing selected values of Δ
- , and the successive samples of the waveform are computed by the formula
  space="preserve" listing-type="equation">S.sub.t =aS.sub.t-1 +bS.sub.t-2 +Δ
  where S_t is the sample being computed, S_t-1 is the next preceding sample, S_t-2 is the second preceding sample, a and b are constants, and Δ
  
  is the value stored in said table at the address defined by the index corresponding to S_t, the improvement comprising encoding, in place of a plurality of different actual waveforms, a single compromise waveform precomputed off-line to contain the fewest possible differences from each of said plurality of waveforms.

8. In a method of digitally converting text to speech, said method including the step of producing fricative sounds by concatenating a plurality of segments each containing a plurality of repetitions of a digitally encoded waveform, the improvement comprising the step of increasingly truncating said waveform for each of said repetitions in any given segment.
- View Dependent Claims (9)
- - 9. The improvement of claim 8, in which the number n of samples in the i'"'"'th repetition of a waveform containing N samples is ##EQU2##

10. In a method of digitally converting text to speech, said method including the step of producing fricative sounds by concatenating a plurality of segments each containing a plurality of repetitions of a digitally encoded waveform, said segments containing different amplitudes of said waveform, the improvement comprising the step of progressively inteprolating the waveform of one of said segments with the waveform of an adjacent segment.

11. In a method of digitally converting text to speech, said method including the step of producing speech by concatenating a plurality of segments each containing at least one concantenated repetation of a stored digitally encoded waveform, a method of varying the speed of the speech comprising the steps of:
- (a) reiteratively counting said concatenated waveform repetitions;
  
  (b) deleting or repeating one of said waveform repetitions when said count reaches a selectable number; and
  
  (c) varying said number.
- View Dependent Claims (12, 13)
- - 12. The method of claim 11, in which said segments are defined by segments blocks containing an index representing the number of waveform repetitions in the segment, and said index is temporarily incremented or decremented whenever said count reaches said selectable number.
  - 13. The method of claim 12, in which said segment is omitted when said index becomes other than a positive integer as a result of being decremented.

14. In a method of digitally converting text to speech, said method including the step of producing speech by concatenating a plurality of segments each containing at least one concatenated repetition of a stored digitally encoded waveform, and in which the waveforms of predetermined successive segments are interpolated in accordance with the formula ##EQU3## where S_t out is the output signal for a given sample;
- S_t in is the input signal for that sample;
  
  S_t-1 out is the output signal for the previous sample; and
  
  k is a non-negative integer,a method of preventing the slurring of formants comprising the step of varying the value of k in accordance with the speed of the speech.
- View Dependent Claims (15, 16)
- - 15. The method of claim 14, in which the value of k is 2 for normal speech, 1 for accelerated speech, and 3 or 4 for slowed speech.
  - 16. The method of claim 14, in which the interpolation of said waveforms is selectively disabled by making k equal to zero.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sierra Entertainment, Inc. (Vivendi SE)
Original Assignee
First Byte
Inventors
Sprague, Richard P.
Primary Examiner(s)
Shoop, Jr., William M.
Assistant Examiner(s)
YOUNG, BRIAN K

Application Number

US06/932,165
Time in Patent Office

980 Days
Field of Search

381/31, 381/34, 381/35, 381/52, 381/36-40, 381/51, 381/53, 358/260, 358/261, 365/45, 340/347, 341/50, 341/59, 341/65, 364/513.5
US Class Current

704/211
CPC Class Codes

G10L 13/07   Concatenation rules

H03M 7/30   Compression speech analysis...

H03M 7/42   using table look-up for the...

H03M 7/50   Conversion to or from non-l...

Compression of stored waveforms for artificial speech

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Compression of stored waveforms for artificial speech

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links