Speech signal distribution system providing supplemental parameter associated data
First Claim
1. A speech signal distribution system comprising:
- a text to speech parameter converter for converting text containing sentences into a data stream, said data stream including a stream of speech signal parameters representing spoken text and lacking phrase-level and sentence-level prosodic content, being suitable for driving an audio signal generator that converts said stream of parameters into an audio speech signal in accordance with a vocal tract model;
a supplemental parameter generator in communication with the text to speech parameter converter, such generator inserting into said data stream additional data, representative of linguistic boundaries, that indicate which parameters in said stream of parameters are associated with predefined boundaries of at least one of phrases and sentences in said text; and
a transmitter for transmitting said, data stream.
9 Assignments
0 Petitions
Accused Products
Abstract
A speech signal distribution system includes a transmitting subsystem and one or more receiving subsystems. The transmitting subsystem has a text to speech converter for converting text into a data stream of formant parameters. A supplemental parameter generator inserts into the data stream supplemental data, including linguistic boundary data indicating which parameters in the stream of formant parameters are associated with predefined linguistic boundaries in the text. In one preferred embodiment, the boundary data indicates which formant parameters in the data stream are associated with sentence boundaries. In addition, the supplemental parameter generator optionally inserts the text, lip position data corresponding to phonemes in the text, and voice setting data into the data stream. The resulting data stream is compressed and transmitted to the receiving subsystems. The receiving subsystem receives the transmitted compressed data stream, decompresses the data stream to regenerate the full data stream, and splits off the supplemental data. The formant data is buffered until boundary data is received indicating that a full sentence, or other linguistic unit, has been received. Then the formant data is processed by an audio signal generator that converts the formant parameters into an audio speech signal in accordance with a vocal tract model. Voice settings in the supplemental data are passed to the audio signal generator, which modifies audio signal generation accordingly. Lip position data in the supplemental data may be processed by an animation program to generate animated pictures of a person speaking.
-
Citations
23 Claims
-
1. A speech signal distribution system comprising:
-
a text to speech parameter converter for converting text containing sentences into a data stream, said data stream including a stream of speech signal parameters representing spoken text and lacking phrase-level and sentence-level prosodic content, being suitable for driving an audio signal generator that converts said stream of parameters into an audio speech signal in accordance with a vocal tract model; a supplemental parameter generator in communication with the text to speech parameter converter, such generator inserting into said data stream additional data, representative of linguistic boundaries, that indicate which parameters in said stream of parameters are associated with predefined boundaries of at least one of phrases and sentences in said text; and a transmitter for transmitting said, data stream. - View Dependent Claims (2, 3, 4, 5, 6, 20, 21, 22, 23)
-
-
7. A speech signal distribution system, comprising:
-
a text to speech parameter converter for converting text containing sentences into a data stream, said data stream including a stream of parameters suitable for driving an audio signal generator that converts said stream of parameters into an audio speech signal in accordance with a vocal tract model;
said text including a sequence of words;a supplemental parameter generator for inserting into said data stream text data representing at least a subset of the words in said text, wherein said text data is inserted at positions in said data stream coinciding with the corresponding parameters in said stream of parameters; and a transmitter for transmitting said data stream. - View Dependent Claims (8)
-
-
9. A speech signal distribution method comprising the steps of:
-
a. converting text containing sentences into a data stream, said data stream including a stream of speech signal parameters representing spoken text and lacking phrase-level and sentence-level prosodic content, being suitable for driving an audio signal generator that converts said stream of parameters into an audio speech signal in accordance with a vocal tract model; b. insertng into said data stream, established by step (a), additional data, representative of linguistic boundaries, that indicate which parameters in said stream of parameters are associated with predefined boundaries of at least one of phrases and sentences in said text; and c. transmitting said data stream. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A speech signal distribution method, comprising the steps of:
-
converting text containing sentences into a data stream, said data stream including a stream of parameters suitable for driving an audio signal generator that converts said stream of parameters into an audio speech signal in accordance with a vocal tract model;
said text including a sequence of words;inserting into said data stream text data representing at least a subset of the words in said text, wherein said text data is inserted at positions in said data stream coinciding with the corresponding parameters in said stream of parameters; and transmitting said data stream. - View Dependent Claims (16)
-
-
17. A speech signal distribution system comprising:
-
a receiving subsystem that receives a data stream transmitted by a remotely located subsystem, said received data stream including (i) a stream of speech signal parameters representing spoken text and lacking phrase-level and sentence-level prosodic content, and (ii) additional data, representative of linguistic boundaries, that indicate which parameters in said stream of speech signal parameters are associated with predefined boundaries of at least one of phrases and sentences in said text; said receiving subsystem including; an audio signal generator that converts said stream of speech signal parameters into an audio speech signal in accordance with a vocal tract model; and a data stream buffer for storing said received data stream in a buffer until said received data stream includes boundary data indicating a linguistic boundary of at least one of phrases and sentences, and for then enabling said stored data stream up to said linguistic boundary to be processed by said audio signal generator. - View Dependent Claims (18, 19)
-
Specification