Compressing and using a concatenative speech database in text-to-speech systems
First Claim
Patent Images
1. A method, comprising:
- receiving input text at a client device;
analyzing the input text to determine diphones;
sending a request to a server for diphone waveform data based on the determined diphones;
locating the requested diphone waveform data by searching a concatenative diphone waveform database at the server;
generating a set of compressed diphone residuals and Linear Predictive Coding (LPC) coefficients by compressing results of the searched diphone waveform database;
storing the set of compressed diphone residuals and the LPC coefficients in a compressed packet;
transmitting the compressed packet to the client device; and
upon receiving the compressed packet, the client device decompresses the compressed packet back to diphone waveform data available for use in a text-to-speech synthesizer.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for compressing and using a concatenative speech database in TTS systems to improve the quality of speech output generated by handheld TTS systems by allowing synthesis to occur on the client. According to one embodiment of the present invention, a G.723 encoder receives diphone waveforms, and compresses them into diphone residuals. While compressing the diphone waveforms, the encoder generates Linear Predictive Coding (LPC) coefficients. The diphone residuals, and the encoder-generated LPC coefficients are then stored in encoder-generated compressed packet.
-
Citations
14 Claims
-
1. A method, comprising:
-
receiving input text at a client device; analyzing the input text to determine diphones; sending a request to a server for diphone waveform data based on the determined diphones; locating the requested diphone waveform data by searching a concatenative diphone waveform database at the server; generating a set of compressed diphone residuals and Linear Predictive Coding (LPC) coefficients by compressing results of the searched diphone waveform database; storing the set of compressed diphone residuals and the LPC coefficients in a compressed packet; transmitting the compressed packet to the client device; and upon receiving the compressed packet, the client device decompresses the compressed packet back to diphone waveform data available for use in a text-to-speech synthesizer. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system comprising:
-
a sever; a client device coupled the sever, the client device to receive input text, analyze the input text to determine diphones, and send a request to the server for diphone waveform data based on the determined diphones; the server to locate diphone waveform data by searching a concatenative diphone waveform database, generate a set of compressed diphone residuals and Linear Predictive Coding (LPC) coefficients by compressed diphone residuals and the LPC coeffients in a compressed packet, and transmit the compressed packet to the client device; and the client device to decompress the compressed packet back to diphone waveform data available for use in a text-to-speech synthesizer. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A machine-readable medium having stored thereon data comprising sets of instructions which, when executed by a machine, cause the machine to:
-
receive input text at a client device; analyze the input text to determine diphones; send a request to a server for diphone waveform data based on the determined diphones; locate the requested diphone waveform data by searching a concatenative diphone waveform database at the server; generate a set of compressed diphone residuals and Linear Predictive Coding (LPC) coefficients by compressing results of the searched diphone waveform database; store the set of compressed diphone residuals and LPC coefficients in a compressed packet; transmit the compressed packet to the client device; and upon receiving the compressed packet, the client device decompresses the compressed packet back to diphone waveform data available for use in a text-to-speech synthesizer. - View Dependent Claims (12, 13, 14)
-
Specification