Text-to-speech system using vector quantization based speech enconding/decoding
First Claim
1. An apparatus for converting text to speech, comprising:
- means for translating the text to a sequence of sound segment codes representing speech;
means for generating a set of noise compensated quantization vectors by encoding the sound segment codes representing speech using a first set of quantization vectors and then performing a noise shaping filter operation on the first set of quantization vectors;
memory storing the set of noise compensated quantization vectors;
means, responsive to sound segment codes in the sequence, for identifying strings of noise compensated quantization vectors in the set of noise compensated quantization vectors for respective sound segment codes in the sequence;
means, coupled to the means for identifying and the memory, for generating a speech data sequence in response to the strings of noise compensated quantization vectors; and
an audio transducer, coupled to the means for generating, to generate sound in response to the speech data sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
A text-to-speech system includes a memory storing a set of quantization vectors. A first processing module is responsive to the sound segment codes generated in response to text in the sequence to identify strings of noise compensated quantization vectors for respective sound segment codes in the sequence. A decoder generates a speech data sequence in response to the strings of quantization vectors. An audio transducer is coupled to the processing modules, and generates sound in response to the speech data sequence. The quantization vectors represent a quantization of a sound segment data having a pre-emphasis to de-correlate the sound samples used for quantization and the quantization noise. In decompressing the sound segment data, an inverse linear prediction filter is applied to the identified strings of quantization vectors to reverse the pre-emphasis. Also, the quantization vectors represent quantization of results of pitch filtering of sound segment data. Thus, an inverse pitch filter is applied to the identified strings of quantization vectors in the module of generating the speech data sequence.
35 Citations
27 Claims
-
1. An apparatus for converting text to speech, comprising:
-
means for translating the text to a sequence of sound segment codes representing speech; means for generating a set of noise compensated quantization vectors by encoding the sound segment codes representing speech using a first set of quantization vectors and then performing a noise shaping filter operation on the first set of quantization vectors; memory storing the set of noise compensated quantization vectors; means, responsive to sound segment codes in the sequence, for identifying strings of noise compensated quantization vectors in the set of noise compensated quantization vectors for respective sound segment codes in the sequence; means, coupled to the means for identifying and the memory, for generating a speech data sequence in response to the strings of noise compensated quantization vectors; and an audio transducer, coupled to the means for generating, to generate sound in response to the speech data sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer system that translates text to speech, comprising:
-
a programmable processor to execute routines to produce a speech data sequence in response to an input text; an audio transducer, coupled to the processor, to generate sound in response to the speech data sequence; a table memory, coupled to the programmable processor, storing a set of noise compensated quantization vectors produced by encoding a sequence of sound segment codes representing speech using a first set of quantization vectors and then performing a noise shaping filter operation on the first set of quantization vectors, and a table of encoded diphones having entries including the sound segment codes representing speech, the sound segment codes identifying a string of noise compensated quantization vectors in the set of noise compensated quantization vectors for respective diphones; and an instruction memory, coupled to the processor, storing a translator routine for execution by the processor to translate the input text to a sequence of diphone indices, and a decoder routine for execution by the processor including means, responsive to diphone indices in the sequence, for accessing the table of encoded diphones to identify strings of noise compensated quantization vectors in the set of noise compensated quantization vectors for diphones in the input text; and means, coupled to the means for accessing and the table memory, for retrieving the identified strings of noise compensated quantization vectors; means, coupled with the means for retrieving, for producing diphone data strings in response to the identified strings of noise compensated quantization vectors, wherein the diphone data strings each have a beginning and an ending; means, coupled to the means for producing, for blending the ending of a particular diphone data string in the sequence with the beginning of an adjacent diphone data string in the sequence to smooth discontinuities between the particular and adjacent diphone data strings to produce a smoothed string of quantized speech data; and means, responsive to the text and the smoothed string of quantized speech data, for adjusting pitch and duration of the identified strings of noise compensated quantization vectors for the diphones in the sequence to produce the speech data sequence for supply to the audio transducer. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
Specification