Compressing and using a concatenative speech database in text-to-speech systems

US 7,035,794 B2
Filed: 03/30/2001
Issued: 04/25/2006
Est. Priority Date: 03/30/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A method, comprising:

receiving input text at a client device;

analyzing the input text to determine diphones;

sending a request to a server for diphone waveform data based on the determined diphones;

locating the requested diphone waveform data by searching a concatenative diphone waveform database at the server;

generating a set of compressed diphone residuals and Linear Predictive Coding (LPC) coefficients by compressing results of the searched diphone waveform database;

storing the set of compressed diphone residuals and the LPC coefficients in a compressed packet;

transmitting the compressed packet to the client device; and

upon receiving the compressed packet, the client device decompresses the compressed packet back to diphone waveform data available for use in a text-to-speech synthesizer.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are provided for compressing and using a concatenative speech database in TTS systems to improve the quality of speech output generated by handheld TTS systems by allowing synthesis to occur on the client. According to one embodiment of the present invention, a G.723 encoder receives diphone waveforms, and compresses them into diphone residuals. While compressing the diphone waveforms, the encoder generates Linear Predictive Coding (LPC) coefficients. The diphone residuals, and the encoder-generated LPC coefficients are then stored in encoder-generated compressed packet.

Citations

14 Claims

1. A method, comprising:
- receiving input text at a client device;
  
  analyzing the input text to determine diphones;
  
  sending a request to a server for diphone waveform data based on the determined diphones;
  
  locating the requested diphone waveform data by searching a concatenative diphone waveform database at the server;
  
  generating a set of compressed diphone residuals and Linear Predictive Coding (LPC) coefficients by compressing results of the searched diphone waveform database;
  
  storing the set of compressed diphone residuals and the LPC coefficients in a compressed packet;
  
  transmitting the compressed packet to the client device; and
  
  upon receiving the compressed packet, the client device decompresses the compressed packet back to diphone waveform data available for use in a text-to-speech synthesizer.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the generating of the set of compressed diphone residuals is performed using an encoder.
  - 3. The method of claim 1, further comprising receiving the request from the text-to-speech synthesizer, the text-to-speech synthesizer residing at the client device.
  - 4. The method of claim 1, further comprising providing pitch marks to the text-to-speech synthesizer.
  - 5. The method of claim 2, wherein the encoder comprises a G.723 encoder.

6. A system comprising:
- a sever;
  
  a client device coupled the sever, the client device toreceive input text,analyze the input text to determine diphones, andsend a request to the server for diphone waveform data based on the determined diphones;
  
  the server tolocate diphone waveform data by searching a concatenative diphone waveform database,generate a set of compressed diphone residuals and Linear Predictive Coding (LPC) coefficients by compressed diphone residuals and the LPC coeffients in a compressed packet, andtransmit the compressed packet to the client device; and
  
  the client device to decompress the compressed packet back to diphone waveform data available for use in a text-to-speech synthesizer.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The system of claim 6, wherein the server is further to generate the set of compressed diphone residuals using an encoder, the encoder including a G.723 encoder.
  - 8. The system of claim 6, wherein the server is further to provide pitch marks to the text-to-speech synthesizer at the client device.
  - 9. The system of claim 8, wherein the text-to-speech synthesizer at the client is further to receive the pitch marks.
  - 10. The system of claim 6, wherein the client device comprises a handheld device including one or more of the following:
    - a telephone, a pocket computer system, and a personal digital assistant (PDA).

11. A machine-readable medium having stored thereon data comprising sets of instructions which, when executed by a machine, cause the machine to:
- receive input text at a client device;
  
  analyze the input text to determine diphones;
  
  send a request to a server for diphone waveform data based on the determined diphones;
  
  locate the requested diphone waveform data by searching a concatenative diphone waveform database at the server;
  
  generate a set of compressed diphone residuals and Linear Predictive Coding (LPC) coefficients by compressing results of the searched diphone waveform database;
  
  store the set of compressed diphone residuals and LPC coefficients in a compressed packet;
  
  transmit the compressed packet to the client device; and
  
  upon receiving the compressed packet, the client device decompresses the compressed packet back to diphone waveform data available for use in a text-to-speech synthesizer.
- View Dependent Claims (12, 13, 14)
- - 12. The machine-readable medium of claim 11, wherein the generating of the set of compressed diphone residuals is performed using an encoder.
  - 13. The method of claim 11, wherein the sets of instructions which, when executed by the machine, further cause the machine to receive the request from the text-to-speech synthesizer, the text-to-speech synthesizer residing at the client device.
  - 14. The machine-readable medium of claim 11, wherein the sets of instructions which, when executed by the machine, further cause the machine to provide pitch marks to the text-to-speech synthesizer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Sirivara, Sudheer
Primary Examiner(s)
Young, W. R.
Assistant Examiner(s)
Vo, Huyen X.

Application Number

US09/822,547
Publication Number

US 20020143543A1
Time in Patent Office

1,852 Days
Field of Search

704/260, 704/267, 704/258, 704/219, 704/262
US Class Current

704/219
CPC Class Codes

G10L 13/06 Elementary speech units use...

G10L 19/06 Determination or coding of ...

Compressing and using a concatenative speech database in text-to-speech systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Compressing and using a concatenative speech database in text-to-speech systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links