Real-time text-to-speech conversion system
First Claim
1. A machine method of converting electrical signals representing text to audible speech in real time, comprising the steps of:
- (a) storing, in the memory of a data processing device, a plurality of digitized waveforms consisting of groups of digitally encoded samples, said waveforms being representative of portions of phonemes and of transitions between phonemes;
(b) analyzing said signals to determine a sequence of phonemes and transitions indicative of the pronunciation of said text;
(c) generating a sequence of codes representing said phonemes and transitions;
(d) using said codes to select groups of said digitized waveforms, each said group representing a phoneme or a transition;
(e) concatenating the waveforms in each of said groups to form a waveform representing the speech sound corresponding to one of said phonemes or transitions;
(f) alternatingly concatenating said phoneme-representing waveform groups and said transition-representing waveform groups to form a composite waveform train representing in digitized form, the spoken equivalent of said text; and
(g) converting said digitized composite waveform into an audible analog signal representative thereof.
3 Assignments
0 Petitions
Accused Products
Abstract
A high-quality, real-time text-to-speech synthesizer system handles an unlimited vocabulary with a minimum of hardware by using a microcomputer-software-compatible time domain methodology which requires a minimum of memory and computational power. The system first compares test words to an exception dictionary. If the word is not found therein, the system applies standard pronunciation rules to the text word. In either instance, the text word is converted to a phoneme sequence. By the use of look-up tables addressed by pointers contained in a phoneme-and-transition matrix, the synthesizer translates the sequence of phonemes and transitions therebetween into sequences of small speech segments capable of being expressed in terms of repetitions of variable-length portions of short digitally stored waveforms. In general, unvoiced transitions are produced by a sequence of segments which can be concatenated in forward or reverse order to generate different transitions out of the same segments; while voiced transitions are produced by interpolating adjacent phonemes for additional memory savings. Pitch can be varied for naturalness of sound, and/or for intonation chanbes derived from key words and/or punctuation in the text, by truncating or extending the waveforms of individual voice periods corresponding to voiced segments.
-
Citations
17 Claims
-
1. A machine method of converting electrical signals representing text to audible speech in real time, comprising the steps of:
-
(a) storing, in the memory of a data processing device, a plurality of digitized waveforms consisting of groups of digitally encoded samples, said waveforms being representative of portions of phonemes and of transitions between phonemes; (b) analyzing said signals to determine a sequence of phonemes and transitions indicative of the pronunciation of said text; (c) generating a sequence of codes representing said phonemes and transitions; (d) using said codes to select groups of said digitized waveforms, each said group representing a phoneme or a transition; (e) concatenating the waveforms in each of said groups to form a waveform representing the speech sound corresponding to one of said phonemes or transitions; (f) alternatingly concatenating said phoneme-representing waveform groups and said transition-representing waveform groups to form a composite waveform train representing in digitized form, the spoken equivalent of said text; and (g) converting said digitized composite waveform into an audible analog signal representative thereof. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A machine method converting electrical signals representing text to audible speech, comprising the steps of:
-
(a) identifying, in a train of signals representing a text of substantially unlimited vocabulary including words and punctuation, signals representing key words affecting intonation; (b) determining, on the basis of said key words and/or punctuation, intonation patterns determining the pitch of individual words or syllables, and pauses therebetween; (c) producing, on the basis of said determined intonation patterns and pauses, prosody indica representative thereof; (d) producing a string of phoneme codes representative of the phonemes making up the pronunciation of said text; (e) interlacing said phoneme codes and said prosody indicia to form a code stream; (f) storing in the memory of a data processing device, a plurality of waveforms; (g) storing, in said memory, sequences of digital data representing segment blocks corresponds to particular phonemes and transitions therebetween, each block identifying one of said stored waveforms and containing voicing information and information regarding the repetition of said identified waveform to produce a sound; (h) storing, in said memory, for each of said phonemes and transitions, information identifying the sequence of segment blocks corresponding to the phoneme or transition represented thereby, and the order in which it is to be read; (i) concatenating the waveforms identified by said segment blocks in accordance with the sequence of segment blocks identified by the phoneme codes of said code stream to form a waveform train; (j) modifying said waveforms in accordance with said prosody indicia of said code stream; and (k) converting said waveform train to a sequence of audible sounds. - View Dependent Claims (10)
-
-
11. A method of converting a string of digital phoneme codes into a sound signal, comprising the steps of:
-
(a) storing, in a data processing device, first and second adjacent phoneme codes of said string as left and right phoneme codes, respectively; (b) producing a sound signal corresponding to the transition between the phonemes represented by said left and right phoneme codes; (c) producing a sound signal corresponding to the phoneme represented by said right phoneme code; (d) substituting said right phoneme code for said left phoneme code to become a new left phoneme code, storing the next phoneme code to said string as a new right phoneme code; and (e) repeating steps (b) through (d) above to process said phoneme code string. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification