Communication system and method using a speaker dependent time-scaling technique
First Claim
1. A method for time-scale modification of speech using a modified version of the Waveform Similarity based Overlap-Add technique (WSOLA), the method comprising the steps of:
- a) storing a portion of an input speech signal in a memory;
b) analyzing the portion of the input speech signal to determined at least one filtered pitch value;
c) calculating an estimated pitch value from the at least one filtered pitch value;
d) determining a segment size in response to the estimated pitch value, the segment size having a value greater than the estimated pitch value and;
e) time-scale compressing the input speech signal in response to the segment size determined.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for time-scale modification of speech using a modified version of the Waveform Similarity based Overlap-Add technique (WSOLA) comprises the steps of storing a portion of an input speech signal in a memory, analyzing the portion of the input speech signal to determined at least one filtered pitch value, calculating an estimated pitch value (12) from the at least one filtered pitch value, determining a segment size (14) in response to the estimated pitch value (12), the segment size (14) having a value greater than the estimated pitch value (12), and time-scale compressing (18) the input speech signal in response to the segment size determined.
-
Citations
48 Claims
-
1. A method for time-scale modification of speech using a modified version of the Waveform Similarity based Overlap-Add technique (WSOLA), the method comprising the steps of:
-
a) storing a portion of an input speech signal in a memory; b) analyzing the portion of the input speech signal to determined at least one filtered pitch value; c) calculating an estimated pitch value from the at least one filtered pitch value; d) determining a segment size in response to the estimated pitch value, the segment size having a value greater than the estimated pitch value and; e) time-scale compressing the input speech signal in response to the segment size determined. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for time-scale modification of speech using a modified version of the Waveform Similarity based Overlap-Add technique (WSOLA), the method comprising the steps of:
-
a) storing a portion of an input speech signal in a memory; b) determining at least one filtered pitch value from the portion of the input speech signal; c) calculating an estimated pitch value from the at least one filtered pitch value; d) determining a segment size in response to the estimated pitch value, the segment size having a value greater than the estimated pitch value; e) time-scale compressing the input speech signal in response to the segment size determined; and f) time-scale expanding the input speech signal in response to the segment size determined. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A method for use in a voice capable device for time-scale modification of speech using a modified version of the Waveform Similarity based Overlap-Add technique (WSOLA) to form an output signal, comprising the steps of:
at an output device; a) determining at least one filtered pitch value from a portion of an input speech signal; b) calculating an estimated pitch value from the at least one filtered pitch value; c) determining an analysis segment size in response to estimated pitch value, the analysis segment size having a value greater than the estimated pitch value; and d) time-scale expanding the input speech signal to provide a resultant output speech signal. - View Dependent Claims (12)
-
13. A method for time-scale modification of speech dependent upon a pitch period of a speaker using a modified version of the Waveform Similarity based Overlap-Add technique (WSOLA), comprising the steps of:
-
a) determining at least one filtered pitch value from a portion of an input speech signal; b) calculating an estimated pitch value from the at least one filtered pitch value; c) determining an analysis segment size being approximately twice the estimated pitch value; d) increasing a time-scaling factor above an average time-scaling factor if the estimated pitch value is below a predetermined threshold; and e) decreasing the time-scaling factor below an average time-scaling factor if the estimated pitch value is above the predetermined threshold. - View Dependent Claims (14, 15, 16)
-
-
17. A method for compressing a plurality of voice signals within a voice communication resource having a given bandwidth within a voice communication system, comprising the steps of:
-
(a) subchanneling the voice communication resource and simultaneously placing at least one voice signal of the plurality of voice signals on a subchannel of a plurality of subchannels; (b) compressing a time of the at least one voice signal within the subchannel, wherein the step of compressing the time of the at least one voice signal includes the steps of; c) determining at least one filtered pitch value from a portion of the at least one voice signal; d) calculating an estimated pitch value from the at least one filtered pitch value for the at least one voice signal; e) determining a segment size for analysis approximately twice the estimated pitch value; f) increasing a time-scaling factor above an average time-scaling factor if the estimated pitch value is below a predetermined threshold; and g) decreasing the time-scaling factor below an average time-scaling factor if the estimated pitch value is above the predetermined threshold, wherein the result of steps (a) through (g) provide a plurality of compressed voice signals. - View Dependent Claims (18, 19, 20)
-
-
21. A communication system using voice compression having at least one transmitter base station and a plurality of selective call receivers, comprising:
-
at the at least one transmitter base station; an input device for receiving an audio signal, a processing device which compresses the audio signal to produce a compressed audio signal and which modulates the compressed audio signal using quadrature amplitude modulation to provide a processed signal, said processing device compresses the audio signal in accordance with the steps of a) analyzing a portion of the audio signal to determined at least one filtered pitch value, b) calculating an estimated pitch value from the at least one filtered pitch value, c) determining a segment size in response to the estimated pitch value, the segment size having a value greater than the estimated pitch value, and d) time-scale compressing the audio signal in response to the segment size determined, and a quadrature amplitude modulation transmitter for transmitting the processed signal; and at each of the plurality of selective call receivers; a selective call receiver for receiving the processed signal which is transmitted, a processing device for demodulating the processed signal which is received using a quadrature amplitude demodulation technique and for time-scale expanding the processed signal which is demodulated to provide a reconstructed signal, and an amplifier for amplifying the reconstructed signal into an reconstructed audio signal. - View Dependent Claims (22, 23, 24, 25, 26)
-
-
27. A selective call receiver for receiving compressed voice signals, comprising:
-
a selective call receiver for receiving a processed signal which is transmitted, the processed signal being processed in accordance with the steps of; a) analyzing a portion of an input speech signal to determined at least one filtered pitch value, b) calculating an estimated pitch value from the at least one filtered pitch value, c) determining a segment size in response to the estimated pitch value, the segment size having a value greater than the estimated pitch value, and d) time-scale expanding the input speech signal in response to the segment size determined; a processing device for demodulating the processed signal which is received using a single side band demodulation technique and a time-scale expansion technique to provide a reconstructed signal; and an amplifier for amplifying the reconstructed signal into an reconstructed audio signal. - View Dependent Claims (28, 29)
-
-
30. A selective call paging base station for transmitting selective call signals on a communication resource having a predetermined bandwidth, comprising:
-
an input device for receiving a plurality of audio signals; a means for subchanneling the communication resource into a predetermined number of subchannels; an amplitude compression and filtering module, for each subchannel of the predetermined number of subchannels, for compressing an amplitude of a respective audio signal and for filtering the respective audio signal; a time-scale compression module which provides compression of the respective audio signal for each of the predetermined number of subchannels, said time-scale compression module operating to generate a processed signal in accordance with the steps of; a) analyzing a portion of an input speech signal to determined at least one filtered pitch value, b) calculating an estimated pitch value from the at least one filtered pitch value, c) determining a segment size in response to the estimated pitch value, the segment size having a value greater than the estimated pitch value, and d) time-scale compressing the input speech signal in response to the segment size determined; and a quadrature amplitude modulation transmitter for transmitting the processed signal. - View Dependent Claims (31, 32, 33)
-
-
34. A selective call receiver, comprising:
-
a receiver having an analog to digital converter for receiving a compressed voice signal that has been compressed using a modified version of the Waveform Similarity based Overlap-Add (WSOLA) compression technique that uses a compression factor that is dependent upon a pitch period of a voice signal which is input in accordance with the steps of; a) analyzing a portion of the voice signal which is input to determined at least one filtered pitch value, b) calculating an estimated pitch value from the at least one filtered pitch value, c) determining a segment size in response to the estimated pitch value, the segment size having a value greater than the estimated pitch value, and d) time-scale compressing the voice signal in response to the segment size determined to generate the compressed voice signal, and providing therefrom a digitized received signal, wherein the compressed voice signal further contains data for determining an expansion factor from the compression factor used in compressing the voice signal; and a signal processor for processing the digitized received signal and for expanding the digitized received signal in accordance with the expansion factor to generate a processed signal. - View Dependent Claims (35, 36, 37, 38, 39)
-
-
40. An electronic device that uses a modified version of the Waveform Similarity based Overlap-Add technique (WSOLA) for time-scale modification of speech, comprising:
-
memory for storing a portion of an input speech signal; a processor for analyzing a portion of an input speech signal to determine at least one filtered pitch value, for calculating an estimated pitch value from the at least one filtered pitch value, and for further determining a segment size in response to the estimated pitch value, the segment size having a value greater than the estimated pitch value; and a means for time-scaling the input speech signal in response to the segment size determined. - View Dependent Claims (41, 42, 43, 44, 45)
-
-
46. A method for time-scale and frequency-scale modification of speech using a modified version of the Waveform Similarity based Overlap-Add technique (WSOLA), the method comprising the steps of:
-
a) storing a portion of an input speech signal in a memory; b) analyzing the portion of the input speech signal to determined at least one filtered pitch value; c) calculate an estimated pitch value from the at least one filtered pitch value; d) determining a segment size in response to the estimated pitch value, the segment size having a value greater than the estimated pitch value; e) time-scaling the input speech signal in response to the segment size determined and a predetermined time-scaling factor, wherein time-scaling provides a time-scaled signal; and f) frequency-scaling of the time-scaled signal. - View Dependent Claims (47, 48)
-
Specification