Real-time text-to-speech conversion system

US 4,692,941 A
Filed: 04/10/1984
Issued: 09/08/1987
Est. Priority Date: 04/10/1984
Status: Expired due to Fees

First Claim

Patent Images

1. A machine method of converting electrical signals representing text to audible speech in real time, comprising the steps of:

(a) storing, in the memory of a data processing device, a plurality of digitized waveforms consisting of groups of digitally encoded samples, said waveforms being representative of portions of phonemes and of transitions between phonemes;

(b) analyzing said signals to determine a sequence of phonemes and transitions indicative of the pronunciation of said text;

(c) generating a sequence of codes representing said phonemes and transitions;

(d) using said codes to select groups of said digitized waveforms, each said group representing a phoneme or a transition;

(e) concatenating the waveforms in each of said groups to form a waveform representing the speech sound corresponding to one of said phonemes or transitions;

(f) alternatingly concatenating said phoneme-representing waveform groups and said transition-representing waveform groups to form a composite waveform train representing in digitized form, the spoken equivalent of said text; and

(g) converting said digitized composite waveform into an audible analog signal representative thereof.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A high-quality, real-time text-to-speech synthesizer system handles an unlimited vocabulary with a minimum of hardware by using a microcomputer-software-compatible time domain methodology which requires a minimum of memory and computational power. The system first compares test words to an exception dictionary. If the word is not found therein, the system applies standard pronunciation rules to the text word. In either instance, the text word is converted to a phoneme sequence. By the use of look-up tables addressed by pointers contained in a phoneme-and-transition matrix, the synthesizer translates the sequence of phonemes and transitions therebetween into sequences of small speech segments capable of being expressed in terms of repetitions of variable-length portions of short digitally stored waveforms. In general, unvoiced transitions are produced by a sequence of segments which can be concatenated in forward or reverse order to generate different transitions out of the same segments; while voiced transitions are produced by interpolating adjacent phonemes for additional memory savings. Pitch can be varied for naturalness of sound, and/or for intonation chanbes derived from key words and/or punctuation in the text, by truncating or extending the waveforms of individual voice periods corresponding to voiced segments.

Citations

17 Claims

1. A machine method of converting electrical signals representing text to audible speech in real time, comprising the steps of:
- (a) storing, in the memory of a data processing device, a plurality of digitized waveforms consisting of groups of digitally encoded samples, said waveforms being representative of portions of phonemes and of transitions between phonemes;
  
  (b) analyzing said signals to determine a sequence of phonemes and transitions indicative of the pronunciation of said text;
  
  (c) generating a sequence of codes representing said phonemes and transitions;
  
  (d) using said codes to select groups of said digitized waveforms, each said group representing a phoneme or a transition;
  
  (e) concatenating the waveforms in each of said groups to form a waveform representing the speech sound corresponding to one of said phonemes or transitions;
  
  (f) alternatingly concatenating said phoneme-representing waveform groups and said transition-representing waveform groups to form a composite waveform train representing in digitized form, the spoken equivalent of said text; and
  
  (g) converting said digitized composite waveform into an audible analog signal representative thereof.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, in which said analyzing step includes the steps of:
    - (i) comparing each group of electrical signals representing a word of said text to a list of words which do not conform to predetermined pronunciation rules; and
      
      (ii) if said word is in said list, determining said code sequence from phonetic code information pre-stored in said list;
      
      or(iii) if said word is not in said list, determining said code sequence from a letter-by-letter analysis of said word in accordance with pre-stored pronunciation rules.
  - 3. The method of claim 1, in which said analyzing step includes the steps of:
    - (i) comparing each group of electrical signals representing a word of said text to a stored list of signals representing key words affecting the intonation of said text;
      
      (ii) using thus identified key words, and signals representing punctuation in said text, to modify said digital representation in accordance with predetermined intonation patterns derived from said key words and punctuation.
  - 4. The method of claim 1, further comprising the steps of:
    - (i) translating said phoneme and transition code sequence into a sequence of speech segments each defined by one or more speech segment blocks in said data processing device memory, each speech segment block identifying a specific waveform, the presence or absence of voicing, and the number of repetitions of said waveform in said segment; and
      
      (ii) concatenating said speech segments and retrieving the waveforms identified thereby to form said waveform groups.
  - 5. The method of claim 4, in which said waveforms are stored in the form of digital samples, and the pitch of voiced speech segments is altered by truncating samples from the end of each voice period or adding zero-value samples to the end of each voice period.
  - 6. The method of claim 1, in which predetermined ones of said transitions are formed by substituting, for at least an initial portion of the waveform representing the phoneme following said transition, an interpolation of that waveform with the waveform representing the phoneme preceding said transition.
  - 7. The method of claim 6, in which said interpolation is linear.
  - 8. The method of claim 4, in which, whenever two adjacent segments of said speech segment sequence are both voiced, at least a portion of the waveform identified by one of said segments adjacent the other is replaced by an interpolation of the waveforms identified by said two adjacent segments.

9. A machine method converting electrical signals representing text to audible speech, comprising the steps of:
- (a) identifying, in a train of signals representing a text of substantially unlimited vocabulary including words and punctuation, signals representing key words affecting intonation;
  
  (b) determining, on the basis of said key words and/or punctuation, intonation patterns determining the pitch of individual words or syllables, and pauses therebetween;
  
  (c) producing, on the basis of said determined intonation patterns and pauses, prosody indica representative thereof;
  
  (d) producing a string of phoneme codes representative of the phonemes making up the pronunciation of said text;
  
  (e) interlacing said phoneme codes and said prosody indicia to form a code stream;
  
  (f) storing in the memory of a data processing device, a plurality of waveforms;
  
  (g) storing, in said memory, sequences of digital data representing segment blocks corresponds to particular phonemes and transitions therebetween, each block identifying one of said stored waveforms and containing voicing information and information regarding the repetition of said identified waveform to produce a sound;
  
  (h) storing, in said memory, for each of said phonemes and transitions, information identifying the sequence of segment blocks corresponding to the phoneme or transition represented thereby, and the order in which it is to be read;
  
  (i) concatenating the waveforms identified by said segment blocks in accordance with the sequence of segment blocks identified by the phoneme codes of said code stream to form a waveform train;
  
  (j) modifying said waveforms in accordance with said prosody indicia of said code stream; and
  
  (k) converting said waveform train to a sequence of audible sounds.
- View Dependent Claims (10)
- - 10. The method of claim 9, in which said step of storing said sequence-identifying information also includes the storing of information defining whether transitions between phonemes are to be produced by interpolation of phoneme segments or by retrieval of a separate segment block sequence.

11. A method of converting a string of digital phoneme codes into a sound signal, comprising the steps of:
- (a) storing, in a data processing device, first and second adjacent phoneme codes of said string as left and right phoneme codes, respectively;
  
  (b) producing a sound signal corresponding to the transition between the phonemes represented by said left and right phoneme codes;
  
  (c) producing a sound signal corresponding to the phoneme represented by said right phoneme code;
  
  (d) substituting said right phoneme code for said left phoneme code to become a new left phoneme code, storing the next phoneme code to said string as a new right phoneme code; and
  
  (e) repeating steps (b) through (d) above to process said phoneme code string.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11, in which said phoneme code string extends over a plurality of words, and silence is encoded as a phoneme.
  - 13. The method of claim 11, in which said sound-producing steps include:
    - (i) storing, in a first table, a first address pointer for each encodable phoneme and for each possible transition between two encodable phonemes;
      
      (ii) storing, in a second table, a plurality of speech segment blocks containing second pointers, said blocks being stored at locations addressable by said first or second pointers;
      
      said segment blocks also containing third pointers;
      
      (iii) storing, in a third table, a plurality of waveforms representing portions of intelligible sounds;
      
      said waveforms being addressable by said third pointers; and
      
      (iv) producing intelligible sound by concatenating said waveforms in the order established by said first and second pointers.
  - 14. The method of claim 13, in which each pointer in said first table is associated with a directional flag;
    - said segment blocks are arranged in sequences determined by said second pointers; and
      
      said sequences are concatenated in forward or reverse order depending upon the condition of said directional flag.
  - 15. The method of claim 14, in which, whenever two consecutive blocks in said sequences are voiced, an interpolation of the waveform addressed by the first of said blocks with the waveform addressed by the second of said blocks is substituted for at least a portion of the waveform addressed by the second of said blocks.
  - 16. The method of claim 14, in which said sound-producing steps further include the step of varying the pitch of segments including repetitions of voiced waveforms by truncating or extending the end of each repetition in accordance with prosody indicia inserted into said phoneme code string.
  - 17. The method of claim 13, in which, when said first pointer has a predetermined value, said sound signal corresponding to said transition is produced by substituting, for at least a portion of said sound signal representing said right phoneme, an interpolation of the signal representing said left phoneme with the signal representing said right phoneme.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sierra Entertainment, Inc. (Vivendi SE)
Original Assignee
First Byte
Inventors
Sprague, Richard P., Jacks, Richard P.
Primary Examiner(s)
Kemeny, E. S. Matt

Application Number

US06/598,892
Time in Patent Office

1,246 Days
Field of Search

381/51-53, 364/513.5
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

Real-time text-to-speech conversion system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Real-time text-to-speech conversion system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links