Speech approximation using successive sinusoidal overlap-add models and pitch-scale modifications

US 5,504,833 A
Filed: 05/04/1994
Issued: 04/02/1996
Est. Priority Date: 08/22/1991
Status: Expired due to Fees

First Claim

Patent Images

1. A method of synthesizing artifact-free modified speech signals from a parameter set and a sequence of frequency-scale modification factors,the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of quasi-harmonic sinusoidal model parameter sets;

each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames;

the method comprising the steps of;

(a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs;

(b) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of frequency-scale modification factors, and the sequence of estimates of a fundamental frequency,(c) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;

(d) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames; and

(e) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for the automatic analysis, synthesis and modification of audio signals, based on an overlap-add sinusoidal model, is disclosed. Automatic analysis of amplitude, frequency and phase parameters of the model is achieved using an analysis-by-synthesis procedure which incorporates successive approximation, yielding synthetic waveforms which are very good approximations to the original waveforms and are perceptually identical to the original sounds. A generalized overlap-add sinusoidal model is introduced which can modify audio signals without objectionable artifacts. In addition, a new approach to pitch-scale modification allows for the use of arbitrary spectral envelope estimates and addresses the problems of high-frequency loss and noise amplification encountered with prior art methods. The overlap-add synthesis method provides the ability to synthesize sounds with computational efficiency rivaling that of synthesis using the discrete short-time Fourier transform (DSTFT) while eliminating the modification artifacts associated with that method.

Citations

24 Claims

1. A method of synthesizing artifact-free modified speech signals from a parameter set and a sequence of frequency-scale modification factors,the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of quasi-harmonic sinusoidal model parameter sets;
- each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames;
  
  the method comprising the steps of;
  
  (a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs;
  
  (b) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of frequency-scale modification factors, and the sequence of estimates of a fundamental frequency,(c) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (d) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames; and
  
  (e) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal.
- View Dependent Claims (2)
- - 2. The method of claim 1 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, the sequence of overlapping speech data frames is further represented by the envelope stream, andthe overlap-add means is additionally responsive to the envelope stream.

3. A method of synthesizing artifact-free modified speech signals from a parameter set and a sequence of time-scale modification factors,the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of quasi-harmonic sinusoidal model parameter sets;
- each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames;
  
  the method comprising the steps of;
  
  (a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs;
  
  (b) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of estimates of a fundamental frequency, and the sequence of time-scale modification factors;
  
  (c) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (d) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames and the sequence of time-scale modification factors; and
  
  (e) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal.
- View Dependent Claims (4)
- - 4. The method of claim 3 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, the sequence of overlapping speech data frames is further represented by the envelope stream, andthe overlap-add means is additionally responsive to the envelope stream.

5. A method of synthesizing artifact-free modified speech signals from a parameter set and a sequence of time-scale modification factors,the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of unmodified quasi-harmonic sinusoidal model parameter sets;
- each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames;
  
  the method comprising the steps of;
  
  (a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs;
  
  (b) generating a sequence of modified quasi-harmonic sinusoidal model parameter sets with a phasor interpolator responsive to the sequence of excitation times, the sequence of pitch-scale modification factors, the sequence of estimates of the fundamental frequency, the sequence of coefficient sets, and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets, each of the modified quasi-harmonic sinusoidal model parameter sets comprising a set of modified amplitudes, a corresponding set of modified frequencies, and a corresponding set of modified phases;
  
  (c) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of modified quasi-harmonic sinusoidal model parameter sets, the sequence of pitch-scale modification factors, and the sequence of estimates of a fundamental frequency;
  
  (d) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (e) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames; and
  
  (f) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal.
- View Dependent Claims (6)
- - 6. The method of claim 5 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, the sequence of overlapping speech data frames is further represented by the envelope stream, andthe overlap-add means is additionally responsive to the envelope stream.

7. A method of synthesizing artifact-free modified musical tone signals from a parameter set and a sequence of frequency-scale modification factors;
- the parameter set comprising a sequence of fundamental frequency estimates and a sequence of quasi-harmonic sinusoidal model parameter sets;
  
  the method comprising the steps of;
  
  (a) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of fundamental frequency estimates, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, and the sequence of frequency-scale modification factors;
  
  (b) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (c) generating a contiguous sequence of music data representative of the modified musical tone signals from an overlap-add means responsive to the time-domain sequence of data frames; and
  
  (d) generating the contiguous sequence of music data into an analog signal using a digital-to-analog converter means to produce the modified musical tone signal.
- View Dependent Claims (8)
- - 8. The method of claim 7 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is additionally responsive to the envelope stream.

9. A method of synthesizing artifact-free modified musical tone signals from a parameter set and a sequence of time-scale modification factors;
- the parameter set comprising a sequence of fundamental frequency estimates and a sequence of quasi-harmonic sinusoidal model parameter sets;
  
  the method comprising the steps of;
  
  (a) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of fundamental frequency estimates, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, and the sequence of time-scale modification factors;
  
  (b) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (c) generating a contiguous sequence of music data representative of the modified musical tone signals from an overlap-add means responsive to the time-domain sequence of data frames and the sequence of time-scale modification factors; and
  
  (d) converting the contiguous sequence of music data into an analog signal using a digital-to-analog converter means to produce the modified musical tone signal.
- View Dependent Claims (10)
- - 10. The method of claim 9 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is additionally responsive to the envelope stream.

11. A method of synthesizing artifact-free modified musical tone signals from a parameter set and a sequence of pitch-scale modification factors;
- the parameter set comprising, a sequence of coefficient sets representative of a sequence of estimates of a spectral envelope, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of unmodified quasi-harmonic sinusoidal model parameter sets;
  
  each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping musical tone data frames;
  
  the method comprising the steps of;
  
  (a) estimating, with a pitch onset time estimator responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding musical tone data frames in the sequence of speech data frames at which an excitation pulse occurs;
  
  (b) generating a sequence of modified quasi-harmonic sinusoidal model parameter sets with a phasor interpolator responsive to the sequence of excitation times, the sequence of pitch-scale modification factors, the sequence of estimates of the fundamental frequency, the sequence of coefficient sets and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets;
  
  (c) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of modified quasi-harmonic sinusoidal model parameter sets, the sequence of pitch-scale modification factors, and the sequence of estimates of a fundamental frequency;
  
  (d) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (e) generating a contiguous sequence of musical data representative of the modified musical tone signal from an overlap-adder responsive to the time-domain sequence of data frames; and
  
  (f) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified tone signal.
- View Dependent Claims (12)
- - 12. The method of claim 11 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, and the overlap-adder s additionally responsive to the envelope stream.

13. An apparatus for generating a signal representative of a synthetic speech waveform from a set of parameters representative of overlapping speech data frames stored in a memory means, and a sequence of frequency scale modification factors;
- the set of parameters comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of a frequency response of a human vocal tract, and a sequence of fundamental frequency estimates,the apparatus comprising;
  
  (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets for generating a first signal representative of a sequence of excitation times relative to the center of each of the corresponding speech data frames at which an excitation pulse occurs;
  
  (b) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of frequency-scale modification factors for producing a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (c) a discrete Fourier transform means responsive to the second signal for generating a transformed signal; and
  
  (d) an overlap-add means responsive to the transformed signal for generating the signal representative of the synthetic speech waveform.
- View Dependent Claims (14)
- - 14. The apparatus of claim 13, wherein the speech information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

15. An apparatus for generating a signal representative of a synthetic speech waveform from a set of parameters representative of overlapping speech data frames stored in a memory means and a sequence of time-scale modification factors,the set of parameters comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of a frequency response of a human vocal tract, and a sequence of fundamental frequency estimates,the apparatus comprising:
- (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets for generating a first signal representative of a sequence of excitation times relative to the center of each of the corresponding speech data frames at which an excitation pulse occurs;
  
  (b) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of time-scale modification factors for producing a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (c) a discrete Fourier transform means responsive to the second signal for generating a transformed signal; and
  
  (d) an overlap-add means responsive to the transformed signal and the sequence of time-scale modification factors for generating the signal representative of the synthetic speech waveform.
- View Dependent Claims (16)
- - 16. The apparatus of claim 15, wherein the speech information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

17. An apparatus for generating a synthetic speech waveform from a set of parameters representative of overlapping speech data frames stored in a memory means and a sequence of pitch-scale modification factors;
- the speech information comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of a frequency response of a human vocal tract, and a sequence of fundamental frequency estimates,the apparatus comprising;
  
  (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets for generating a first signal representative of a sequence of time estimates relative to the center of each of the frames at which an excitation pulse occurs;
  
  (b) a phasor interpolator means electrically coupled to the memory means and the pitch onset time estimator means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of pitch-scale modification factors for generating a sequence of modified quasi-harmonic sinusoidal model parameter sets;
  
  (c) a discrete Fourier transform assignment means electrically coupled to the phasor interpolator means and the pitch onset time estimator means and responsive to the sequence of fundamental frequency estimates, the sequence of modified quasi-harmonic sinusoidal model parameter sets, the first signal and the sequence of pitch-scale modification factors for producing a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (d) a discrete Fourier transform means responsive to the second signal for generating a transformed signal; and
  
  (e) an overlap-add means responsive to the transformed signal for generating the signal representative of the synthetic speech waveform.
- View Dependent Claims (18)
- - 18. The apparatus of claim 17, wherein the speech information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

19. An apparatus for generating a signal representative of a synthetic musical waveform from a set of parameters representative of overlapping musical tone data frames stored in a memory means and a sequence of frequency scale modification factors;
- the parameter set comprising a sequence of quasi-harmonic sinusoidal model parameter sets and a sequence of fundamental frequency estimates,the apparatus comprising;
  
  (a) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, and the sequence of frequency-scale modification factors for producing a first signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (b) a discrete Fourier transform means responsive to the first signal for generating a transformed signal; and
  
  (c) an overlap-add means responsive to the transformed signal for generating the signal representative of the synthetic musical waveform.
- View Dependent Claims (20)
- - 20. The apparatus of claim 19 wherein the musical information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

21. An apparatus for generating a signal representative of a synthetic musical waveform from a set of parameters representative of overlapping musical tone data frames stored in a memory means and a sequence of time-scale modification factors;
- the parameter set comprising a sequence of quasi-harmonic sinusoidal model parameter sets and a sequence of fundamental frequency estimates,the apparatus comprising;
  
  (a) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, and the sequence of frequency-scale modification factors for producing a first signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (b) a discrete Fourier transform means responsive to the first signal for generating a transformed signal; and
  
  (c) an overlap-add means responsive to the transformed signal and the sequence of time-scale modification factors for generating the signal representative of the synthetic musical waveform.
- View Dependent Claims (22)
- - 22. The apparatus of claim 21 wherein the musical information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

23. An apparatus for generating a signal representative of a synthetic musical tone waveform from a set of parameters representative of overlapping frames of musical data stored in a memory means and a sequence of pitch-scale modification factors;
- the musical information comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of estimates of a spectral envelope, and a sequence of fundamental frequency estimates,the apparatus comprising;
  
  (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets for generating a first signal representative of a sequence of time estimates relative to the center of each of the frames at which an excitation pulse occurs;
  
  (b) a phasor interpolator means electrically coupled to the memory means and the pitch onset time estimator means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of pitch-scale modification factors for generating a sequence of modified quasi-harmonic sinusoidal model parameter sets;
  
  (c) a discrete Fourier transform assignment means electrically coupled to the phasor interpolator means and responsive to the sequence of fundamental frequency estimates, the sequence of modified quasi-harmonic sinusoidal model parameter sets, and the sequence of pitch-scale modification factors for producing a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (d) a discrete Fourier transform means responsive to the second signal for generating a transformed signal; and
  
  (e) an overlap-add means responsive to the transformed signal for generating the representative of the synthetic musical tone waveform.
- View Dependent Claims (24)
- - 24. The apparatus of claim 23, wherein the musical information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Georgia Tech Research Corporation (University System of Georgia)
Original Assignee
Georgia Tech Research Corporation (University System of Georgia)
Inventors
Smith, Mark J. T., George, E. Bryan
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
GROVER, JOHN M

Application Number

US08/238,171
Time in Patent Office

699 Days
Field of Search

395/2.1-2.34, 395/2.39, 395/2.67-2.78, 381/51, 381/29-40, 381/41, 381/46-50, 381/52-53, 381/94
US Class Current

704/211
CPC Class Codes

G10L 15/12   using dynamic programming t...

G10L 19/02   using spectral analysis, e....

G10L 21/02   Speech enhancement, e.g. no...

G10L 25/24   the extracted parameters be...

G10L 25/27   characterised by the analys...

Speech approximation using successive sinusoidal overlap-add models and pitch-scale modifications

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Speech approximation using successive sinusoidal overlap-add models and pitch-scale modifications

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links