Text-to-speech system using vector quantization based speech enconding/decoding

US 5,717,827 A
Filed: 04/15/1996
Issued: 02/10/1998
Est. Priority Date: 01/21/1993
Status: Expired due to Term

First Claim

Patent Images

1. An apparatus for converting text to speech, comprising:

means for translating the text to a sequence of sound segment codes representing speech;

means for generating a set of noise compensated quantization vectors by encoding the sound segment codes representing speech using a first set of quantization vectors and then performing a noise shaping filter operation on the first set of quantization vectors;

memory storing the set of noise compensated quantization vectors;

means, responsive to sound segment codes in the sequence, for identifying strings of noise compensated quantization vectors in the set of noise compensated quantization vectors for respective sound segment codes in the sequence;

means, coupled to the means for identifying and the memory, for generating a speech data sequence in response to the strings of noise compensated quantization vectors; and

an audio transducer, coupled to the means for generating, to generate sound in response to the speech data sequence.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A text-to-speech system includes a memory storing a set of quantization vectors. A first processing module is responsive to the sound segment codes generated in response to text in the sequence to identify strings of noise compensated quantization vectors for respective sound segment codes in the sequence. A decoder generates a speech data sequence in response to the strings of quantization vectors. An audio transducer is coupled to the processing modules, and generates sound in response to the speech data sequence. The quantization vectors represent a quantization of a sound segment data having a pre-emphasis to de-correlate the sound samples used for quantization and the quantization noise. In decompressing the sound segment data, an inverse linear prediction filter is applied to the identified strings of quantization vectors to reverse the pre-emphasis. Also, the quantization vectors represent quantization of results of pitch filtering of sound segment data. Thus, an inverse pitch filter is applied to the identified strings of quantization vectors in the module of generating the speech data sequence.

35 Citations

View as Search Results

27 Claims

1. An apparatus for converting text to speech, comprising:
- means for translating the text to a sequence of sound segment codes representing speech;
  
  means for generating a set of noise compensated quantization vectors by encoding the sound segment codes representing speech using a first set of quantization vectors and then performing a noise shaping filter operation on the first set of quantization vectors;
  
  memory storing the set of noise compensated quantization vectors;
  
  means, responsive to sound segment codes in the sequence, for identifying strings of noise compensated quantization vectors in the set of noise compensated quantization vectors for respective sound segment codes in the sequence;
  
  means, coupled to the means for identifying and the memory, for generating a speech data sequence in response to the strings of noise compensated quantization vectors; and
  
  an audio transducer, coupled to the means for generating, to generate sound in response to the speech data sequence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The apparatus of claim 1, wherein the sound segment codes comprise data encoded using the first set of quantization vectors, and the set of noise compensated quantization vectors is different from the first set of quantization vectors according to the noise shaping filter function.
  - 3. The apparatus of claim 1, wherein the first set of quantization vectors represent quantization of filtered sound sediment data, and the means for generating a speech data sequence includes:
    - means for applying an inverse filter to the identified strings of noise compensated quantization vectors in generation of the speech data sequence, wherein the inverse filter includes parameters chosen so that any multiplies are replaced by shift and/or add operations in application of the inverse filter.
  - 4. The apparatus of claim 1, wherein means for translating includes a table of encoded diphones, having entries including data identifying a string of noise compensated quantization vectors in the set of noise compensated quantization vectors for respective diphones, and the sequence of sound segment codes comprises a sequence of indices to the table of encoded diphones representing the text;
    - andthe means for identifying strings of noise compensated quantization vectors includes means responsive to the sound segment codes for accessing the entries in the table of encoded diphones.
  - 5. The apparatus of claim 1, wherein the first set of quantization vectors represent quantization of filtered sound segment data, and the means for generating a speech data sequence includes:
    - means for applying an inverse filter to the identified strings of the noise compensated quantization vectors in generation of the speech data sequence.
  - 6. The apparatus of claim 1, wherein the first set of quantization vectors represent quantization of results of linear prediction filtering of sound segment data, and the means for generating a speech data sequence includes:
    - means for applying a inverse linear prediction filter to the identified strings of noise compensated quantization vectors in generation of the speech data sequence.
  - 7. The apparatus of claim 1, wherein the first set of quantization vectors represent quantization of results of pitch filtering of sound segment data, and the means for generating a speech data sequence includes:
    - means for applying an inverse pitch filter to the identified strings of noise compensated quantization vectors in generation of the speech data sequence.
  - 8. The apparatus of claim 1, wherein the first set of quantization vectors represent quantization of results of pitch filtering and linear prediction filtering of sound segment data, and the means for generating a speech data sequence includes:
    - means for applying an inverse pitch filter to the identified strings of noise compensated quantization vectors in generation of the speech data sequence to produce a filtered data sequence; and
      
      means for applying a inverse linear prediction filter to the filtered data sequence in generation of the speech data sequence.
  - 9. The apparatus of claim 1, wherein the means for generating a speech data sequence includes:
    - means for concatenating the identified strings of noise compensated quantization vectors and supplying the concatenated strings for the speech data sequence.
  - 10. The apparatus of claim 1, wherein the identified strings of noise compensated quantization vectors each have a beginning and an ending, and means for generating a speech data sequence includes:
    - means for supplying the identified strings of noise compensated quantization vectors for respective sound segment codes in sequence; and
      
      means for blending the ending of an identified string of noise compensated quantization vectors of a particular sound segment code in the sequence with the beginning an identified string of noise compensated quantization vectors of an adjacent sound segment code in the sequence to smooth discontinuities between the particular and adjacent sound segment codes in the speech data sequence.
  - 11. The apparatus of claim 1, wherein the means for generating a speech data sequence includes:
    - means, responsive to the sound segment codes for adjusting pitch and duration of the identified strings of noise compensated quantization vectors in the speech data sequence.
  - 12. The apparatus of claim 1, wherein the identified strings of noise compensated quantization vectors each have a beginning and an ending, and means for generating a speech data sequence includes:
    - means for supplying the identified strings of noise compensated quantization vectors for respective sound segment codes in sequence;
      
      means for blending the ending of an identified string of noise compensated quantization vectors of a particular sound segment code in the sequence with the beginning an identified string of noise compensated quantization vectors of an adjacent sound segment code in the sequence to smooth discontinuities between the particular and adjacent sound segment codes in the speech data sequence; and
      
      means, responsive to the sound segment codes for adjusting pitch and duration of the identified strings of noise compensated quantization vectors in the speech data sequence.
  - 13. The apparatus of claim 1, further including an encoder including:
    - a store for an encoding set of quantization vectors different from the set of noise compensated quantization vectors used in decoding; and
      
      means for generating the sound segment codes in response to the encoding set and sound segment data.
  - 14. The apparatus of claim 13, wherein the encoder further includes a linear prediction filter.
  - 15. The apparatus of claim 13, wherein the encoder further includes a pitch filter.
  - 16. The apparatus of claim 13, wherein the encoder further includes a linear prediction filter and a pitch filter.

17. A computer system that translates text to speech, comprising:
- a programmable processor to execute routines to produce a speech data sequence in response to an input text;
  
  an audio transducer, coupled to the processor, to generate sound in response to the speech data sequence;
  
  a table memory, coupled to the programmable processor, storing a set of noise compensated quantization vectors produced by encoding a sequence of sound segment codes representing speech using a first set of quantization vectors and then performing a noise shaping filter operation on the first set of quantization vectors, and a table of encoded diphones having entries including the sound segment codes representing speech, the sound segment codes identifying a string of noise compensated quantization vectors in the set of noise compensated quantization vectors for respective diphones; and
  
  an instruction memory, coupled to the processor, storing a translator routine for execution by the processor to translate the input text to a sequence of diphone indices, and a decoder routine for execution by the processor includingmeans, responsive to diphone indices in the sequence, for accessing the table of encoded diphones to identify strings of noise compensated quantization vectors in the set of noise compensated quantization vectors for diphones in the input text; and
  
  means, coupled to the means for accessing and the table memory, for retrieving the identified strings of noise compensated quantization vectors;
  
  means, coupled with the means for retrieving, for producing diphone data strings in response to the identified strings of noise compensated quantization vectors, wherein the diphone data strings each have a beginning and an ending;
  
  means, coupled to the means for producing, for blending the ending of a particular diphone data string in the sequence with the beginning of an adjacent diphone data string in the sequence to smooth discontinuities between the particular and adjacent diphone data strings to produce a smoothed string of quantized speech data; and
  
  means, responsive to the text and the smoothed string of quantized speech data, for adjusting pitch and duration of the identified strings of noise compensated quantization vectors for the diphones in the sequence to produce the speech data sequence for supply to the audio transducer.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 18. The apparatus of claim 17, wherein the data identifying a string of noise compensated quantization vectors comprise data encoded using the first set of quantization vectors, and the set of noise compensated quantization vectors is different from the first set of quantization vectors according to the noise shaping filter operation.
  - 19. The apparatus of claim 17, wherein the first set of quantization vectors represent quantization of filtered sound segment data, and the means for generating a speech data sequence includes:
    - means for applying an inverse filter to the identified strings of noise compensated quantization vectors in generation of the speech data sequence, wherein the inverse filter includes parameters chosen so that any multiplies are replaced by shift and/or add operations in application of the inverse filter.
  - 20. The apparatus of claim 17, wherein the first set of quantization vectors represent quantization of filtered sound segment data, and the means for producing diphone data strings includes:
    - means for applying an inverse filter to the identified strings of noise compensated quantization vectors.
  - 21. The apparatus of claim 17, wherein the first set of quantization vectors represent quantization of results of linear prediction filtering of sound segment data, and the means for producing diphone data strings includes:
    - means for applying a inverse linear prediction filter to the identified strings of noise compensated quantization vectors.
  - 22. The apparatus of claim 17, wherein the first set of quantization vectors represent quantization of results of pitch filtering of sound segment data, and the means for producing diphone data strings includes:
    - means for applying an inverse pitch filter to the identified strings of noise compensated quantization vectors.
  - 23. The apparatus of claim 17, wherein the first set of quantization vectors represent quantization of results of pitch filtering and linear prediction filtering of sound segment data, and the means for producing diphone data strings includes:
    - means for applying an inverse pitch filter to the identified strings of noise compensated quantization vectors to produce a filtered data sequence; and
      
      means for applying an inverse linear prediction filter to the filtered data sequence.
  - 24. The apparatus of claim 17, further including an encoder including:
    - a store for an encoding set of quantization vectors different from the set of noise compensated quantization vectors used in decoding; and
      
      means for generating the sound segment codes in response to the encoding set and sound segment data.
  - 25. The apparatus of claim 24, wherein the encoder further includes a linear prediction filter.
  - 26. The apparatus of claim 24, wherein the encoder further includes a pitch filter.
  - 27. The apparatus of claim 24, wherein the encoder further includes a linear prediction filter and a pitch filter.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Narayan, Shankar
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
SAX, ROBERT L

Application Number

US08/632,121
Time in Patent Office

666 Days
Field of Search

395/2.67, 395/2.71, 395/2.73, 395/2.75, 395/2.78, 395/2.31
US Class Current

704/260
CPC Class Codes

G10L 13/047   Architecture of speech synt...

G10L 13/07   Concatenation rules

G10L 19/04   using predictive techniques

Text-to-speech system using vector quantization based speech enconding/decoding

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

35 Citations

27 Claims

Specification

Use Cases

Quick Links

Others

Text-to-speech system using vector quantization based speech enconding/decoding

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

35 Citations

27 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others