Method for facilitating text to speech synthesis using a differential vocoder
First Claim
1. A method for facilitating text to speech synthesis, comprising:
- providing a database of preconditioned encoded speech tokens, each of the preconditioned encoded speech tokens in a differential encoding format;
receiving a call from a text to speech engine for a requested speech waveform unit, the requested speech waveform unit corresponding to a text segment to be synthesized into speech;
retrieving from the database of preconditioned encoded speech tokens a preconditioned encoded speech token corresponding to the requested speech waveform unit;
pre-appending a seed token onto the preconditioned encoded speech token, to provide a seeded preconditioned encoded speech token;
decoding the seeded preconditioned encoded speech token with a differential vocoder to provide a seeded speech waveform unit having a seed portion followed by a speech waveform portion;
removing the seed portion from the seeded speech waveform unit to provide the requested speech waveform unit; and
returning the requested speech waveform unit to the text to speech engine.
1 Assignment
0 Petitions
Accused Products
Abstract
A text to speech system (100) uses differential voice coding (230, 416) to compress a database of digitized speech waveform segments (210). A seed waveform (535) is used to precondition each speech waveform prior to encoding which, upon encoding, provides a seeded preconditioned encoded speech token (550). The seed portion (541) may be removed and the preconditioned encoded speech token portion (542) may be stored in a database for text to speech synthesis. When speech it to be synthesized, upon requesting the appropriate speech waveform for the present sound to be produced, the seed portion is preappended to the preconditioned encoded speech token for differential decoding.
182 Citations
19 Claims
-
1. A method for facilitating text to speech synthesis, comprising:
-
providing a database of preconditioned encoded speech tokens, each of the preconditioned encoded speech tokens in a differential encoding format;
receiving a call from a text to speech engine for a requested speech waveform unit, the requested speech waveform unit corresponding to a text segment to be synthesized into speech;
retrieving from the database of preconditioned encoded speech tokens a preconditioned encoded speech token corresponding to the requested speech waveform unit;
pre-appending a seed token onto the preconditioned encoded speech token, to provide a seeded preconditioned encoded speech token;
decoding the seeded preconditioned encoded speech token with a differential vocoder to provide a seeded speech waveform unit having a seed portion followed by a speech waveform portion;
removing the seed portion from the seeded speech waveform unit to provide the requested speech waveform unit; and
returning the requested speech waveform unit to the text to speech engine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of generating a database of preconditioned encoded speech tokens from a speech waveform database having a plurality of speech waveform units, each one of the plurality of speech waveform units corresponding to a speech sound, the method comprising:
-
retrieving from the speech waveform database one of the plurality of speech waveform units;
pre-appending a null reference frame to the speech waveform unit to provide a pre-appended speech waveform unit;
encoding the pre-appended speech waveform unit into a seeded preconditioned encoded speech token using a differential vocoder;
removing the seeded token from the seeded preconditioned encoded speech token, to provide a preconditioned encoded speech token; and
indexing the preconditioned encoded speech token to correspond with an index entry of the speech waveform token. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
Specification