Speech approximation using successive sinusoidal overlap-add models and pitch-scale modifications
First Claim
1. A method of synthesizing artifact-free modified speech signals from a parameter set and a sequence of frequency-scale modification factors,the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of quasi-harmonic sinusoidal model parameter sets;
- each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames;
the method comprising the steps of;
(a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs;
(b) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of frequency-scale modification factors, and the sequence of estimates of a fundamental frequency,(c) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
(d) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames; and
(e) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for the automatic analysis, synthesis and modification of audio signals, based on an overlap-add sinusoidal model, is disclosed. Automatic analysis of amplitude, frequency and phase parameters of the model is achieved using an analysis-by-synthesis procedure which incorporates successive approximation, yielding synthetic waveforms which are very good approximations to the original waveforms and are perceptually identical to the original sounds. A generalized overlap-add sinusoidal model is introduced which can modify audio signals without objectionable artifacts. In addition, a new approach to pitch-scale modification allows for the use of arbitrary spectral envelope estimates and addresses the problems of high-frequency loss and noise amplification encountered with prior art methods. The overlap-add synthesis method provides the ability to synthesize sounds with computational efficiency rivaling that of synthesis using the discrete short-time Fourier transform (DSTFT) while eliminating the modification artifacts associated with that method.
-
Citations
24 Claims
-
1. A method of synthesizing artifact-free modified speech signals from a parameter set and a sequence of frequency-scale modification factors,
the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of quasi-harmonic sinusoidal model parameter sets; -
each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames; the method comprising the steps of; (a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs; (b) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of frequency-scale modification factors, and the sequence of estimates of a fundamental frequency, (c) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames; (d) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames; and (e) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal. - View Dependent Claims (2)
-
-
3. A method of synthesizing artifact-free modified speech signals from a parameter set and a sequence of time-scale modification factors,
the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of quasi-harmonic sinusoidal model parameter sets; -
each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames; the method comprising the steps of; (a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs; (b) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of estimates of a fundamental frequency, and the sequence of time-scale modification factors; (c) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames; (d) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames and the sequence of time-scale modification factors; and (e) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal. - View Dependent Claims (4)
-
-
5. A method of synthesizing artifact-free modified speech signals from a parameter set and a sequence of time-scale modification factors,
the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of unmodified quasi-harmonic sinusoidal model parameter sets; -
each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames; the method comprising the steps of; (a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs; (b) generating a sequence of modified quasi-harmonic sinusoidal model parameter sets with a phasor interpolator responsive to the sequence of excitation times, the sequence of pitch-scale modification factors, the sequence of estimates of the fundamental frequency, the sequence of coefficient sets, and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets, each of the modified quasi-harmonic sinusoidal model parameter sets comprising a set of modified amplitudes, a corresponding set of modified frequencies, and a corresponding set of modified phases; (c) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of modified quasi-harmonic sinusoidal model parameter sets, the sequence of pitch-scale modification factors, and the sequence of estimates of a fundamental frequency; (d) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames; (e) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames; and (f) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal. - View Dependent Claims (6)
-
-
7. A method of synthesizing artifact-free modified musical tone signals from a parameter set and a sequence of frequency-scale modification factors;
-
the parameter set comprising a sequence of fundamental frequency estimates and a sequence of quasi-harmonic sinusoidal model parameter sets; the method comprising the steps of; (a) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of fundamental frequency estimates, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, and the sequence of frequency-scale modification factors; (b) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames; (c) generating a contiguous sequence of music data representative of the modified musical tone signals from an overlap-add means responsive to the time-domain sequence of data frames; and (d) generating the contiguous sequence of music data into an analog signal using a digital-to-analog converter means to produce the modified musical tone signal. - View Dependent Claims (8)
-
-
9. A method of synthesizing artifact-free modified musical tone signals from a parameter set and a sequence of time-scale modification factors;
-
the parameter set comprising a sequence of fundamental frequency estimates and a sequence of quasi-harmonic sinusoidal model parameter sets; the method comprising the steps of; (a) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of fundamental frequency estimates, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, and the sequence of time-scale modification factors; (b) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames; (c) generating a contiguous sequence of music data representative of the modified musical tone signals from an overlap-add means responsive to the time-domain sequence of data frames and the sequence of time-scale modification factors; and (d) converting the contiguous sequence of music data into an analog signal using a digital-to-analog converter means to produce the modified musical tone signal. - View Dependent Claims (10)
-
-
11. A method of synthesizing artifact-free modified musical tone signals from a parameter set and a sequence of pitch-scale modification factors;
-
the parameter set comprising, a sequence of coefficient sets representative of a sequence of estimates of a spectral envelope, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of unmodified quasi-harmonic sinusoidal model parameter sets; each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping musical tone data frames; the method comprising the steps of; (a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding musical tone data frames in the sequence of speech data frames at which an excitation pulse occurs; (b) generating a sequence of modified quasi-harmonic sinusoidal model parameter sets with a phasor interpolator responsive to the sequence of excitation times, the sequence of pitch-scale modification factors, the sequence of estimates of the fundamental frequency, the sequence of coefficient sets and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets; (c) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of modified quasi-harmonic sinusoidal model parameter sets, the sequence of pitch-scale modification factors, and the sequence of estimates of a fundamental frequency; (d) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames; (e) generating a contiguous sequence of musical data representative of the modified musical tone signal from an overlap-adder responsive to the time-domain sequence of data frames; and (f) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified tone signal. - View Dependent Claims (12)
-
-
13. An apparatus for generating a signal representative of a synthetic speech waveform from a set of parameters representative of overlapping speech data frames stored in a memory means, and a sequence of frequency scale modification factors;
-
the set of parameters comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of a frequency response of a human vocal tract, and a sequence of fundamental frequency estimates, the apparatus comprising; (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets for generating a first signal representative of a sequence of excitation times relative to the center of each of the corresponding speech data frames at which an excitation pulse occurs; (b) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of frequency-scale modification factors for producing a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm; (c) a discrete Fourier transform means responsive to the second signal for generating a transformed signal; and (d) an overlap-add means responsive to the transformed signal for generating the signal representative of the synthetic speech waveform. - View Dependent Claims (14)
-
-
15. An apparatus for generating a signal representative of a synthetic speech waveform from a set of parameters representative of overlapping speech data frames stored in a memory means and a sequence of time-scale modification factors,
the set of parameters comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of a frequency response of a human vocal tract, and a sequence of fundamental frequency estimates, the apparatus comprising: -
(a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets for generating a first signal representative of a sequence of excitation times relative to the center of each of the corresponding speech data frames at which an excitation pulse occurs; (b) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of time-scale modification factors for producing a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm; (c) a discrete Fourier transform means responsive to the second signal for generating a transformed signal; and (d) an overlap-add means responsive to the transformed signal and the sequence of time-scale modification factors for generating the signal representative of the synthetic speech waveform. - View Dependent Claims (16)
-
-
17. An apparatus for generating a synthetic speech waveform from a set of parameters representative of overlapping speech data frames stored in a memory means and a sequence of pitch-scale modification factors;
-
the speech information comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of a frequency response of a human vocal tract, and a sequence of fundamental frequency estimates, the apparatus comprising; (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets for generating a first signal representative of a sequence of time estimates relative to the center of each of the frames at which an excitation pulse occurs; (b) a phasor interpolator means electrically coupled to the memory means and the pitch onset time estimator means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of pitch-scale modification factors for generating a sequence of modified quasi-harmonic sinusoidal model parameter sets; (c) a discrete Fourier transform assignment means electrically coupled to the phasor interpolator means and the pitch onset time estimator means and responsive to the sequence of fundamental frequency estimates, the sequence of modified quasi-harmonic sinusoidal model parameter sets, the first signal and the sequence of pitch-scale modification factors for producing a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm; (d) a discrete Fourier transform means responsive to the second signal for generating a transformed signal; and (e) an overlap-add means responsive to the transformed signal for generating the signal representative of the synthetic speech waveform. - View Dependent Claims (18)
-
-
19. An apparatus for generating a signal representative of a synthetic musical waveform from a set of parameters representative of overlapping musical tone data frames stored in a memory means and a sequence of frequency scale modification factors;
-
the parameter set comprising a sequence of quasi-harmonic sinusoidal model parameter sets and a sequence of fundamental frequency estimates, the apparatus comprising; (a) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, and the sequence of frequency-scale modification factors for producing a first signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm; (b) a discrete Fourier transform means responsive to the first signal for generating a transformed signal; and (c) an overlap-add means responsive to the transformed signal for generating the signal representative of the synthetic musical waveform. - View Dependent Claims (20)
-
-
21. An apparatus for generating a signal representative of a synthetic musical waveform from a set of parameters representative of overlapping musical tone data frames stored in a memory means and a sequence of time-scale modification factors;
-
the parameter set comprising a sequence of quasi-harmonic sinusoidal model parameter sets and a sequence of fundamental frequency estimates, the apparatus comprising; (a) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, and the sequence of frequency-scale modification factors for producing a first signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm; (b) a discrete Fourier transform means responsive to the first signal for generating a transformed signal; and (c) an overlap-add means responsive to the transformed signal and the sequence of time-scale modification factors for generating the signal representative of the synthetic musical waveform. - View Dependent Claims (22)
-
-
23. An apparatus for generating a signal representative of a synthetic musical tone waveform from a set of parameters representative of overlapping frames of musical data stored in a memory means and a sequence of pitch-scale modification factors;
-
the musical information comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of estimates of a spectral envelope, and a sequence of fundamental frequency estimates, the apparatus comprising; (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets for generating a first signal representative of a sequence of time estimates relative to the center of each of the frames at which an excitation pulse occurs; (b) a phasor interpolator means electrically coupled to the memory means and the pitch onset time estimator means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of pitch-scale modification factors for generating a sequence of modified quasi-harmonic sinusoidal model parameter sets; (c) a discrete Fourier transform assignment means electrically coupled to the phasor interpolator means and responsive to the sequence of fundamental frequency estimates, the sequence of modified quasi-harmonic sinusoidal model parameter sets, and the sequence of pitch-scale modification factors for producing a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm; (d) a discrete Fourier transform means responsive to the second signal for generating a transformed signal; and (e) an overlap-add means responsive to the transformed signal for generating the representative of the synthetic musical tone waveform. - View Dependent Claims (24)
-
Specification