Retaining prosody during speech analysis for later playback
First Claim
1. A method of communicating speech signals comprising the steps of:
- storing at a first location a plurality of input voice fonts, each input voice font comprising information describing a plurality of speech segments, each speech segment identified by a segment ID;
selecting one of the plurality of input voice fonts;
designating one of a plurality of voice fonts to be used as an output voice font;
receiving an analog speech signal, said analog speech signal comprising a plurality of speech segments;
digitizing the analog speech signal;
identifying each of the plurality of speech segments in the received speech signal;
measuring one or more prosodic parameters for each of said identified segments in relation to the segments of the selected input voice font; and
transmitting a data signal from the first location to a second location, said data signal comprising segment IDs, values of the measured prosodic parameters of the speech segments in the received speech signal, and an output voice font ID identifying the designated output voice font;
storing at the second location a plurality of output voice fonts, each output voice font comprising information describing a plurality of speech segments, each speech segment identified by a segment ID;
receiving the transmitted data signal at the second location;
identifying in said received data signal the segment IDs, the values of the measured prosodic parameters, and the designated output voice font corresponding to the received output voice font ID;
selecting, in the designated output voice font, the information describing a plurality of speech segments corresponding to the received segment IDs;
modifying the selected speech segment information according to the received values of the corresponding prosodic parameters; and
generating a speech signal based on the modified speech segment information.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech system includes a speech encoding system and a speech decoding system. The speech encoding system includes a speech analyzer for identifying each of the speech segments (i.e., phonemes) in the received digitized speech signal. A pitch detector, a duration detector, and an amplitude detector are each coupled to the memory and the analyzer and detect various prosodic parameters of each received speech segment. A speech encoder generates a data signal that includes the speech segment IDs and the values of the corresponding prosodic parameters. The speech decoding system includes a digital data decoder and a speech synthesizer for generating a speech signal based on the segment IDs and prosodic parameter values.
108 Citations
10 Claims
-
1. A method of communicating speech signals comprising the steps of:
-
storing at a first location a plurality of input voice fonts, each input voice font comprising information describing a plurality of speech segments, each speech segment identified by a segment ID; selecting one of the plurality of input voice fonts; designating one of a plurality of voice fonts to be used as an output voice font; receiving an analog speech signal, said analog speech signal comprising a plurality of speech segments; digitizing the analog speech signal; identifying each of the plurality of speech segments in the received speech signal; measuring one or more prosodic parameters for each of said identified segments in relation to the segments of the selected input voice font; and transmitting a data signal from the first location to a second location, said data signal comprising segment IDs, values of the measured prosodic parameters of the speech segments in the received speech signal, and an output voice font ID identifying the designated output voice font; storing at the second location a plurality of output voice fonts, each output voice font comprising information describing a plurality of speech segments, each speech segment identified by a segment ID; receiving the transmitted data signal at the second location; identifying in said received data signal the segment IDs, the values of the measured prosodic parameters, and the designated output voice font corresponding to the received output voice font ID; selecting, in the designated output voice font, the information describing a plurality of speech segments corresponding to the received segment IDs; modifying the selected speech segment information according to the received values of the corresponding prosodic parameters; and generating a speech signal based on the modified speech segment information. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An apparatus for encoding speech signals comprising:
-
a memory storing a plurality of voice fonts, each said voice font comprising a digitized pattern for each of a plurality of speech segments, each speech segment identified by a segment ID; an A/D converter adapted to receive an analog speech signal and having an output; a speech analyzer coupled to said memory and said A/D converter, said speech analyzer adapted to receive a digitized speech signal and identify each of the segments in the digitized speech signal based on a selected one of said voice fonts, said speech analyzer adapted to output the segment ID for each of said identified speech segments; one or more prosodic parameter detectors coupled to said memory and said speech analyzer, said detectors adapted to measure values of the prosodic parameters of each received digitized speech segment; and a data encoder coupled to said speech analyzer and adapted to generate a digital data signal for transmission or storage, said digital data signal comprising a segment ID and the measured values of the corresponding measured prosodic parameters for each of the identified speech segments and a voice font ID identifying one of a plurality of output voice fonts for use in regenerating the speech signal.
-
-
7. A computer for encoding speech signals comprising:
-
a CPU; an audio input device adapted to receive an analog audio or speech signal and having an output; an A/D converter having an input coupled to the output of said audio input device and an output coupled to said CPU; a memory coupled to said CPU, said memory storing software and a plurality of voice fonts, each voice font comprising a digitized pattern and a corresponding segment ID for each of a plurality of speech segments; and said CPU being adapted to; identify, using a selected one of said voice fonts as an input voice font, each of a plurality of speech segments in a received digitized speech signal; measure one or more prosodic parameters for each of the identified segments; and generate a data signal comprising segment IDs and values of the measured prosodic parameters of each of the identified speech segments and a voice font ID designating one of a plurality of voice fonts to be used as an output voice font for use in regenerating the speech signal. - View Dependent Claims (8)
-
-
9. An apparatus for decoding speech signals comprising:
-
a memory storing a plurality of output voice fonts, each output voice font comprising a digitized pattern for each of a plurality of speech segments, each speech segment identified by a segment ID; a data decoder coupled to said memory and receiving a digital data stream from a transmission medium, said decoder identifying in the received data stream a voice font ID designating one of a plurality of voice fonts to be used as an output voice font, a segment ID and values of one or more corresponding prosodic parameters for each of the plurality of speech segments in the received data stream; a speech synthesizer coupled to said memory and said decoder, said synthesizer selecting digitized patterns in the designated output voice font corresponding to the identified segment IDs, modifying the selected digitized patterns according to the values of the corresponding prosodic parameters, and outputting the modified speech patterns to generate a speech signal.
-
-
10. A method of speech encoding comprising the steps of:
-
selecting one of a plurality of voice fonts to be used as an input voice font; designating one of a plurality of voice fonts to be used as an output voice font, said output voice font being different from said input voice font; receiving an analog speech signal, said analog speech signal comprising a plurality of speech segments; digitizing the analog speech signal; identifying each of the plurality of speech segments in the received speech signal; measuring one or more prosodic parameters for each of said identified segments in relation to segments of the selected input voice font; outputting a data signal comprising a voice font ID identifying the designated output voice font, segment IDs and values of the measured prosodic parameters of the speech segments in the received speech signal; receiving the data signal; and generating a speech signal using the designated output voice font based on the segment IDs and the values of the measured prosodic parameters in the data signal.
-
Specification