Compressing & using a concatenative speech database in text-to-speech systems

US 20020143543A1
Filed: 03/30/2001
Published: 10/03/2002
Est. Priority Date: 03/30/2001
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving diphone waveforms;

compressing the diphone waveforms into diphone residuals, wherein the compressing is performed using an encoder;

generating linear predictive coding (LPC) coefficients, wherein the LPC coefficients are generated by the encoder; and

storing the diphone residuals and the encoder-generated LPC coefficients in a compressed packet, wherein the compressed packet is generated by the encoder.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are provided for compressing and using a concatenative speech database in TTS systems to improve the quality of speech output generated by handheld TTS systems by allowing synthesis to occur on the client. According to one embodiment of the present invention, a G.723 encoder receives diphone waveforms, and compresses them into diphone residuals. While compressing the diphone waveforms, the encoder generates Linear Predictive Coding (LPC) coefficients. The diphone residuals, and the encoder-generated LPC coefficients are then stored in encoder-generated compressed packet.

Citations

27 Claims

1. A method comprising:
- receiving diphone waveforms;
  
  compressing the diphone waveforms into diphone residuals, wherein the compressing is performed using an encoder;
  
  generating linear predictive coding (LPC) coefficients, wherein the LPC coefficients are generated by the encoder; and
  
  storing the diphone residuals and the encoder-generated LPC coefficients in a compressed packet, wherein the compressed packet is generated by the encoder.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 further comprising:
    - a waveform synthesizer requesting diphone residuals;
      
      locating the requested diphone residuals in the compressed packet;
      
      extracting the located diphone residuals from the compressed packet;
      
      decompressing the extracted diphone residuals, wherein the decompressing is performed using a decoder; and
      
      supplying the diphone residuals to the waveform synthesizer.
  - 3. The method of claim 2 further comprising supplying the encoder-generated LPC coefficients to the waveform synthesizer.
  - 4. The method of claim 2 further comprising supplying pitch marks to the waveform synthesizer.
  - 5. The method of claim 2 further comprising the waveform synthesizer producing speech output.
  - 6. The method of claim 1, wherein the encoder is a G.723 encoder.
  - 7. The method of claim 1, wherein the decoder is a modified G.723 decoder.

8. A method comprising:
- receiving diphone waveforms;
  
  compressing the diphone waveforms into diphone residuals, wherein the compressing is performed using an encoder;
  
  generating linear predictive coding (LPC) coefficients, wherein the LPC coefficients are generated by the encoder;
  
  storing the diphone residuals and the coder-generated LPC coefficients in a compressed packet, wherein the compressed packet is generated by the encoder;
  
  a waveform synthesizer requesting the diphone residuals;
  
  locating the requested diphone residuals in the compressed packet;
  
  extracting the located diphone residuals from the compressed packet; and
  
  decompressing the extracted diphone residuals, wherein the decompressing is performed using a decoder; and
  
  supplying the diphone residuals and the encoder-generated LPC coefficients to the waveform synthesizer.
- View Dependent Claims (9, 10, 11, 13, 14, 15, 16, 17, 19)
- - 9. The method of claim 8 further comprising supplying pitch marks to the waveform synthesizer.
  - 10. The method of claim 8, wherein the encoder is a G.723 encoder.
  - 11. The method of claim 8, wherein the decoder is a G.723 decoder.
  - 13. The system of claim 12, wherein the text-to-speech system comprising:
    - a text analysis module for processing a text into forms of linguistic representations;
      
      a linguistic and prosodic analysis module for analyzing the forms of linguistic representations corresponding to their assigned language system; and
      
      a waveform synthesizer for producing a speech output.
  - 14. The system of claim 12, wherein the concatenative speech database comprising:
    - diphone waveforms;
      
      LPC coefficients; and
      
      pitch marks.
  - 15. The system of claim 14, wherein the diphone waveforms are compressed to diphone residuals.
  - 16. The system of claim 12, wherein the coder is a G.723 coder.
  - 17. The system of claim 16, wherein the G.723 coder comprises:
    - a G.723 encoder for compressing the concatenative speech database; and
      
      a G.723 decoder for decompressing the concatenative speech database.
  - 19. The method of claim 18, wherein the compressed packets comprising diphone residuals and audio encoder-generated LPC coefficients.

12. A system for compressing and using concatenative speech databases in text-to-speech systems comprising:
- a text-to-speech system;
  
  a concatenative speech database; and
  
  a coder.

18. A method of producing a compressed concatenative diphone database comprising:
- compressing diphone waveforms and generating linear predictive coding (LPC) coefficients by applying an audio encoder to the diphone waveforms; and
  
  storing compressed packets produced by the audio encoder and uncompressed pitch mark values as a compressed concatenative diphone database.

20. The method for a handheld device with a text-to-speech system using a compressed concatenative diphone database comprising:
- compressing diphone waveforms into diphone residuals and generating linear predictive coding (LPC) coefficients by applying an audio encoder to the diphone waveforms;
  
  storing compressed packets produced by the audio encoder and uncompressed pitch mark values as a compressed concatenative diphone database;
  
  decompressing the compressed concatenative diphone database by applying an audio decoder to the diphone residuals and the LPC coefficients; and
  
  synthesizing the decompressed concatenative diphone database including the uncompressed pitch mark values to produce an output by applying a waveform synthesizer.
- View Dependent Claims (21, 22, 24, 25, 26, 27)
- - 21. The method of claim 20 further comprising the handheld device downloading a customizable speech database.
  - 22. The method of claim 20, wherein the synthesizing is client-based.
  - 24. The concatenative speech database structure of claim 23, wherein the diphone waveforms are reduced to diphone residuals after compression.
  - 25. The concatenative speech database structure of claim 23, wherein the difference equation is a linear predictor expressing each new sample of a signal as a linear combination of previous samples.
  - 26. The concatenative speech database structure of claim 23, wherein the formants are the resonance characterizing vocal tract.
  - 27. The concatenative speech database structure of claim 23, wherein the pitch mark values correspond to changes in fundamental frequency.

23. A concatenative speech database structure comprising:
- diphone waveforms indicating smallest units of speech for efficient text-to-speech conversion that are derived from phonemes;
  
  linear predictive coefficients of a difference equation for characterizing formants; and
  
  pitch mark values marking positions in an utterance indicating varying pitch.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Sirivara, Sudheer

Granted Patent

US 7,035,794 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/06 Elementary speech units use...

G10L 19/06 Determination or coding of ...

Compressing & using a concatenative speech database in text-to-speech systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Compressing & using a concatenative speech database in text-to-speech systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links