Audio analysis/synthesis system

US 5,327,518 A
Filed: 08/22/1991
Issued: 07/05/1994
Est. Priority Date: 08/22/1991
Status: Expired due to Fees

First Claim

Patent Images

1. A method of extracting a set of parameters representative of input speech signals representing speech from a human vocal tract, the vocal tract having a frequency response capable of representation as a set of coefficients, such that artifact-free, modified synthetic speech signals can be generated from said parameters, comprising the steps of:

(a) digitizing the input speech signals into a speech data stream;

(b) isolating a sequence of overlapping speech data frames from the speech data stream, each of said speech data frames having a fundamental frequency;

(c) analyzing the sequence of overlapping speech data frames to produce a corresponding sequence of coefficient sets representative of an estimate of the frequency response of the human vocal tract;

(d) multiplying each of the overlapping speech data frames by an analysis window function to create a corresponding sequence of windowed data frames;

(e) calculating the discrete Fourier transform of each of the windowed data frames to produce a corresponding sequence of transformed data frames;

(f) approximating the corresponding sequence of overlapping speech data frames with a sequence of sinusoidal parameter sets using a first iterative analysis-by-synthesis means responsive to the sequence of transformed data frames and a discrete Fourier transform of the analysis window function;

(g) analyzing the sequence of sinusoidal parameter sets and the corresponding sequence of coefficient sets with a fundamental frequency estimator means to produce a sequence of estimates of the fundamental frequency of the corresponding overlapping speech data frames; and

(h) analyzing the sequence of fundamental frequency estimates and the corresponding sequence of sinusoidal parameter sets with a harmonic assignment means to produce a sequence of quasi-harmonic sinusoidal model parameter sets;

the set of parameters representative of the input speech signals comprising the sequence of coefficient sets representative of the estimate of the frequency response of the human vocal tract, the sequence of estimates of the fundamental frequency, and the sequence of quasi-harmonic sinusoidal model parameter sets.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for the automatic analysis, synthesis and modification of audio signals, based on an overlap-add sinusoidal model, is disclosed. Automatic analysis of amplitude, frequency and phase parameters of the model is achieved using an analysis-by-synthesis procedure which incorporates successive approximation, yielding synthetic waveforms which are very good approximations to the original waveforms and are perceptually identical to the original sounds. A generalized overlap-add sinusoidal model is introduced which can modify audio signals without objectionable artifacts. In addition, a new approach to pitch-scale modification allows for the use of arbitrary spectral envelope estimates and addresses the problems of high-frequency loss and noise amplification encountered with prior art methods. The overlap-add synthesis method provides the ability to synthesize sounds with computational efficiency rivaling that of synthesis using the discrete short-time Fourier transform (DSTFT) while eliminating the modification artifacts associated with that method.

Citations

28 Claims

1. A method of extracting a set of parameters representative of input speech signals representing speech from a human vocal tract, the vocal tract having a frequency response capable of representation as a set of coefficients, such that artifact-free, modified synthetic speech signals can be generated from said parameters, comprising the steps of:
- (a) digitizing the input speech signals into a speech data stream;
  
  (b) isolating a sequence of overlapping speech data frames from the speech data stream, each of said speech data frames having a fundamental frequency;
  
  (c) analyzing the sequence of overlapping speech data frames to produce a corresponding sequence of coefficient sets representative of an estimate of the frequency response of the human vocal tract;
  
  (d) multiplying each of the overlapping speech data frames by an analysis window function to create a corresponding sequence of windowed data frames;
  
  (e) calculating the discrete Fourier transform of each of the windowed data frames to produce a corresponding sequence of transformed data frames;
  
  (f) approximating the corresponding sequence of overlapping speech data frames with a sequence of sinusoidal parameter sets using a first iterative analysis-by-synthesis means responsive to the sequence of transformed data frames and a discrete Fourier transform of the analysis window function;
  
  (g) analyzing the sequence of sinusoidal parameter sets and the corresponding sequence of coefficient sets with a fundamental frequency estimator means to produce a sequence of estimates of the fundamental frequency of the corresponding overlapping speech data frames; and
  
  (h) analyzing the sequence of fundamental frequency estimates and the corresponding sequence of sinusoidal parameter sets with a harmonic assignment means to produce a sequence of quasi-harmonic sinusoidal model parameter sets;
  
  the set of parameters representative of the input speech signals comprising the sequence of coefficient sets representative of the estimate of the frequency response of the human vocal tract, the sequence of estimates of the fundamental frequency, and the sequence of quasi-harmonic sinusoidal model parameter sets.
- View Dependent Claims (2)
- - 2. The method of claim 1 wherein each of the quasi-harmonic sinusoidal model parameter sets comprises a set of amplitudes, a corresponding set of frequencies, and a corresponding set of phases.

3. A method of extracting a set of parameters representative of input speech signals having a time-varying average magnitude and representing speech from a human vocal tract, the vocal tract having a frequency response capable of representation as a set of coefficients, such that artifact-free, modified synthetic speech signals can be generated from said parameters, comprising the steps of:
- (a) digitizing the input speech signals into a speech data stream having a time-varying average magnitude;
  
  (b) isolating a sequence of overlapping speech data frames from the speech data stream, each of said speech data frames having a fundamental frequency;
  
  (c) analyzing the sequence of overlapping speech data frames to produce a corresponding sequence of coefficient sets representative of an estimate of the frequency response of the human vocal tract;
  
  (d) calculating an envelope stream reflective of the time-varying average magnitude from the speech data stream;
  
  (e) isolating a sequence of overlapping envelope data frames from the envelope stream;
  
  (f) multiplying each one of the sequence of overlapping envelope data frames with a corresponding one of the sequence of overlapping speech data frames to produce a corresponding sequence of first product frames;
  
  (g) squaring each of the overlapping envelope data frames to produce a corresponding sequence of squared envelope data frames;
  
  (h) multiplying each one of the sequence of first product frames by an analysis window function to produce a corresponding sequence of second product frames;
  
  (i) multiplying each one of the sequence of squared envelope data frames by an analysis window function to produce a corresponding sequence of third product frames;
  
  (j) calculating the discrete Fourier transform of each of the sequence of second product frames to produce a sequence of first transform frames;
  
  (k) calculating the discrete Fourier transform of each of the sequence of third product frames to produce a sequence of second transform frames;
  
  (l) approximating the corresponding sequence of overlapping speech data frames with a sequence of sinusoidal parameter sets produced from a first iterative analysis-by-synthesis means responsive to the sequence of first transform frames and the sequence of second transform frames;
  
  (m) analyzing the sequence of sinusoidal parameter sets and the corresponding sequence of coefficient sets with a fundamental frequency estimator means to produce a sequence of estimates of the fundamental frequency of the corresponding overlapping speech data frames; and
  
  (n) analyzing the sequence of estimates of the fundamental frequency and the corresponding sequence of sinusoidal parameter sets with a harmonic assignment means to produce a sequence of quasi-harmonic sinusoidal model parameter sets;
  
  the set of parameters representative of the input speech signals comprising the sequence of coefficient sets representative of the estimate of the frequency response of the human vocal tract, the sequence of estimates of the fundamental frequency, the sequence of quasi-harmonic sinusoidal model parameter sets, and the envelope stream.
- View Dependent Claims (4)
- - 4. The method of claim 3 wherein each of the quasi-harmonic sinusoidal model parameter sets comprises a set of amplitudes, a corresponding set of frequencies, and a corresponding set of phases.

5. A method of extracting a set of parameters representative of input musical tone signals having a time-varying average magnitude and a nominal expected pitch frequency, such that artifact-free, modified synthetic musical tones can be generated from said parameters, comprising the steps of:
- (a) digitizing the input musical tone signals into a musical tone data stream having a time-varying average magnitude;
  
  (b) isolating a sequence of overlapping musical tone data frames from the musical tone data stream, each of the musical tone data frames having a fundamental frequency;
  
  (c) calculating an envelope stream reflective of the time-varying changes in the average magnitude of the musical tone from the musical tone data stream and the nominal expected pitch frequency;
  
  (d) isolating a sequence of overlapping envelope data frames from the envelope stream;
  
  (e) multiplying each one of the sequence of overlapping envelope data frames with a corresponding one of the sequence of overlapping musical tone data frames to produce a corresponding sequence of first product frames;
  
  (f) squaring each of the overlapping envelope data frames to produce a corresponding sequence of squared envelope data frames;
  
  (g) multiplying each one of the sequence of first product frames by an analysis window function to produce a corresponding sequence of second product frames;
  
  (h) multiplying each one of the sequence of squared envelope data frames by an analysis window function to produce a corresponding sequence of third product frames;
  
  (i) calculating the discrete Fourier transform of each of the sequence of second product frames to produce a sequence of first transform frames;
  
  (j) calculating the discrete Fourier transform of each of the sequence of third product frames to produce a sequence of second transform frames;
  
  (k) analyzing the sequence of first transform frames, the sequence of second transform frames, and the nominal expected pitch frequency with a harmonically constrained iterative analysis-by-synthesis means to produce a sequence of quasi-harmonic sinusoidal parameter sets that approximate the corresponding sequence of overlapping musical tone data frames, each of the quasi-harmonic sinusoidal parameter sets having a spectral envelope, and a sequence of fundamental frequency estimates of the corresponding overlapping musical tone data frames; and
  
  (l) analyzing the sequence of overlapping musical tone data frames with a system estimator means to produce a sequence of coefficient sets representative of the spectral envelope of the quasi-harmonic sinusoidal model parameters;
  
  the set of parameters representative of the input musical tone signals comprising the envelope stream reflective of the time-varying changes in the average magnitude of the input musical tone signals;
  
  the sequence of sets of coefficients representative of the spectral envelope of the quasi-harmonic sinusoidal model parameters;
  
  the sequence of fundamental frequency estimates; and
  
  the sequence of sets of sinusoidal parameters that approximate the corresponding sequence of overlapping musical tone data frames.
- View Dependent Claims (6)
- - 6. The method of claim 5 wherein each of the quasi-harmonic sinusoidal model parameter sets comprises a set of amplitudes, a corresponding set of frequencies, and a corresponding set of phases.

7. A method of extracting a set of parameters representative of input musical tone signals having a nominal expected pitch frequency, such that artifact-free, modified synthetic musical tones can be generated from said parameters, comprising the steps of:
- (a) digitizing the input musical tone signals into a musical tone data stream;
  
  (b) isolating a sequence of overlapping musical tone data frames from the musical tone data stream;
  
  (c) multiplying each one of the sequence of overlapping musical tone data frames with an analysis window function to produce a corresponding sequence of first product frames;
  
  (d) calculating the discrete Fourier transform of each of the sequence of first product frames to produce a sequence of first transform frames;
  
  (e) producing a sequence of sets of sinusoidal parameters that approximate the corresponding sequence of overlapping musical tone data frames and a sequence of fundamental frequency estimates from a first harmonically constrained iterative analysis-by-synthesis means responsive to the sequence of first transform frames, a discrete Fourier transform of the analysis window function, and the nominal expected pitch frequency; and
  
  (f) analyzing the sequence of overlapping musical tone data frames with a system estimator means to produce a sequence of coefficient sets representative of the spectral envelope of the quasi-harmonic sinusoidal model parameters;
  
  the set of parameters representative of the input musical tone signals comprising the sequence of sets of coefficients representative of the spectral envelope of the quasi-harmonic sinusoidal model parameters;
  
  the sequence of fundamental frequency estimates; and
  
  the sequence of sets of sinusoidal parameters that approximate the corresponding sequence of musical tone data frames.
- View Dependent Claims (8)
- - 8. The method of claim 7 wherein each of the quasi-harmonic sinusoidal model parameter sets comprises a set of amplitudes, a corresponding set of frequencies, and a corresponding set of phases.

9. A method of synthesizing artifact-free modified speech signals from a parameter set, a sequence of frequency-scale modification factors and a sequence of time-scale modification factors,the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of quasi-harmonic sinusoidal model parameter sets;
- each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames;
  
  the method comprising the steps of;
  
  (a) estimating, with a pitch onset time estimator means responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs;
  
  (b) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of frequency-scale modification factors, the sequence of estimates of a fundamental frequency, and the sequence of time-scale modification factors;
  
  (c) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (d) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames and the sequence of time-scale modification factors; and
  
  (e) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal.
- View Dependent Claims (10)
- - 10. The method of claim 9 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, the sequence of overlapping speech data frames is further represented by the envelope stream, and the overlap-add means is additionally responsive to the envelope stream.

11. A method of synthesizing artifact-free modified speech signals from a parameter set, a sequence of pitch-scale modification factors and a sequence of time-scale modification factors,the parameter set comprising a sequence of coefficient sets representative of a sequence of estimates of the frequency response of a human vocal tract, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of unmodified quasi-harmonic sinusoidal model parameter sets;
- each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping speech data frames;
  
  the method comprising the steps of;
  
  (a) estimating, with a pitch onset time estimator means responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding overlapping speech data frames in the sequence of speech data frames at which an excitation pulse occurs;
  
  (b) generating a sequence of modified quasi-harmonic sinusoidal model parameter sets with a phasor interpolator means responsive to the sequence of excitation times, the sequence of pitch-scale modification factors, the sequence of estimates of the fundamental frequency, the sequence of coefficient sets, and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets, each of the modified quasi-harmonic sinusoidal model parameter sets comprising a set of modified amplitudes, a corresponding set of modified frequencies, and a corresponding set of modified phases;
  
  (c) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of excitation times, the corresponding sequence of modified quasi-harmonic sinusoidal model parameter sets, the sequence of pitch-scale modification factors, the sequence of estimates of a fundamental frequency, and the sequence of time-scale modification factors;
  
  (d) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (e) generating a contiguous sequence of speech data representative of the modified speech signal from an overlap-add means responsive to the time-domain sequence of data frames and the sequence of time-scale modification factors; and
  
  (f) converting the contiguous sequence of speech data into an analog signal using a digital-to-analog converter means to produce the modified speech signal.
- View Dependent Claims (12)
- - 12. The method of claim 11 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is additionally responsive to the envelope stream.

13. A method of synthesizing artifact-free modified musical tone signals from a parameter set, a sequence of frequency-scale modification factors, and a sequence of time-scale modification factors;
- the parameter set comprising a sequence of fundamental frequency estimates and a sequence of quasi-harmonic sinusoidal model parameter sets;
  
  the method comprising the steps of;
  
  (a) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of fundamental frequency estimates, the corresponding sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of frequency-scale modification factors, and the sequence of time-scale modification factors;
  
  (b) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (c) generating a contiguous sequence of music data representative of the modified musical tone signals from an overlap-add means responsive to the time-domain sequence of data frames and the sequence of time-scale modification factors; and
  
  (d) converting the contiguous sequence of music data into an analog signal using a digital-to-analog converter means to produce the modified musical tone signal.
- View Dependent Claims (14)
- - 14. The method of claim 13 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is additionally responsive to the envelope stream.

15. A method of synthesizing artifact-free modified musical tone signals from a parameter set, a sequence of pitch-scale modification factors and a sequence of time-scale modification factors,the parameter set comprising, a sequence of coefficient sets representative of a sequence of estimates of a spectral envelope, a corresponding sequence of estimates of a fundamental frequency, and a corresponding sequence of unmodified quasi-harmonic sinusoidal model parameter sets;
- each one of the estimates of a fundamental frequency and the corresponding quasi-harmonic sinusoidal model parameter set comprising a representation of one of a sequence of overlapping musical tone data frames;
  
  the method comprising the steps of;
  
  (a) estimating, with a pitch onset time estimator means responsive to the sequence of coefficient sets, the sequence of estimates of a fundamental frequency, and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets, a sequence of excitation times relative to the centers of each one of the corresponding musical tone data frames in the sequence of musical tone data frames at which an excitation pulse occurs;
  
  (b) generating a sequence of modified quasi-harmonic sinusoidal model parameter sets with a phasor interpolator means responsive to the sequence of excitation times, the sequence of pitch-scale modification factors the sequence of estimates of the fundamental frequency, the sequence of coefficient sets and the sequence of unmodified quasi-harmonic sinusoidal model parameter sets;
  
  (c) generating a frequency-domain sequence of data frames from a discrete Fourier transform assignment means responsive to the sequence of modified quasi-harmonic sinusoidal model parameter sets, the sequence of pitch-scale modification factors, the sequence of estimates of a fundamental frequency, and the sequence of time-scale modification factors;
  
  (d) transforming the frequency-domain sequence of data frames with an inverse discrete Fourier transform means to produce a time-domain sequence of data frames;
  
  (e) generating a contiguous sequence of musical data representative of the modified musical tone signal from an overlap-add means responsive to the time-domain sequence of data frames and the sequence of time-scale modification factors; and
  
  (f) converting the contiguous sequence of musical data into an analog signal using a digital-to-analog converter means to produce the modified musical tone signal.
- View Dependent Claims (16)
- - 16. The method of claim 15 wherein the parameter set further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is additionally responsive to the envelope stream.

17. An apparatus for analyzing a stream of digital signals representing a speech waveform, the speech waveform having characteristics including a frequency response of a human vocal tract;
- the apparatus comprising;
  
  (a) frame segmenting means responsive to the stream of digital signals for segmenting the stream of digital signals into a sequence of overlapping speech data frames;
  
  (b) system estimator means responsive to the sequence of overlapping speech data frames for producing a sequence of coefficient sets representative of a sequence of estimates of the frequency response of the human vocal tract;
  
  (c) analysis windowing means characterized by an analysis window function and responsive to the sequence of overlapping speech data frames for producing a sequence of windowed data frames;
  
  (d) discrete Fourier transform means responsive to the windowed data frames for producing a sequence of transformed data frames;
  
  (e) a first memory means for storing and retrieving a sequence of precomputed data representative of the discrete Fourier transform of the window function;
  
  (f) iterative analysis-by-synthesis means responsive to the sequence of transformed data frames and the retrieved sequence of precomputed data for generating a sequence of sinusoidal model parameter sets approximating the corresponding overlapping speech data frames;
  
  (g) fundamental frequency estimator means responsive to the sequence of sinusoidal model parameter sets and the sequence of coefficient sets for generating a sequence of estimates of fundamental frequency;
  
  (h) harmonic assignment means responsive to the sequence of fundamental frequency estimates and the sinusoidal model parameter sets to produce a sequence of quasi-harmonic sinusoidal model parameter sets; and
  
  (i) a second memory means for storing the sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of coefficient sets, and the sequence of fundamental frequency estimates.

18. An apparatus for analyzing a stream of digital signals representing a speech waveform, the speech waveform having characteristics including a time-varying average magnitude and a frequency response of a human vocal tract;
- the apparatus comprising;
  
  (a) a first frame segmenting means responsive to the stream of digital signals for segmenting the stream of digital signals into a sequence of overlapping speech data frames;
  
  (b) a time-varying average magnitude calculating means responsive to the stream of digital signals to produce an envelope stream representative of the time-varying changes in the average magnitude of the speech waveform;
  
  (c) a second frame segmenting means responsive to the envelope stream for segmenting the envelope stream into a sequence of overlapping envelope data frames;
  
  (d) a multiplying means responsive to the sequence of overlapping envelope data frames and the sequence of overlapping speech data frames for generating a sequence of first product frames;
  
  (e) a squaring means responsive to the sequence of overlapping envelope data frames for producing a sequence of squared envelope data frames;
  
  (f) a first analysis window means responsive to the sequence of first product frames for generating a first sequence of windowed frames;
  
  (g) a second analysis window means responsive to the sequence of squared envelope data frames for producing a second sequence of windowed frames;
  
  (h) a first discrete Fourier transform means responsive to the first sequence of windowed frames for generating a first sequence of transformed windowed frames;
  
  (i) a second discrete Fourier transform means responsive to the second sequence of windowed frames for generating a second sequence of transformed windowed frames;
  
  (j) an iterative analysis-by-synthesis means responsive to the first and the second sequence of transformed windowed frames to produce a sequence of sinusoidal model parameter sets;
  
  (k) system estimator means responsive to the sequence of overlapping speech data frames for producing a sequence of coefficient sets representative of the frequency response of the human vocal tract;
  
  (l) fundamental frequency estimator means responsive to the sequence of sinusoidal model parameter sets and the sequence of coefficient sets for generating a sequence of fundamental frequency estimates;
  
  (m) harmonic assignment means responsive to the sequence of fundamental frequency estimates and the sinusoidal model parameter sets to produce a sequence of quasi-harmonic sinusoidal model parameter sets; and
  
  (n) a memory means for storing the sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the envelope stream.

19. An apparatus for generating a synthetic speech waveform from a set of parameters representative of overlapping speech data frames stored in a memory means, a sequence of frequency scale modification factors, and a sequence of time-scale modification factors,the set of parameters comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of a frequency response of a human vocal tract, and a sequence of fundamental frequency estimates,the apparatus comprising:
- (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets to generate a first signal representative of a sequence of excitation times relative to the center of each of the corresponding speech data frames at which an excitation pulse occurs;
  
  (b) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, the sequence of time-scale modification factors, and the sequence of frequency-scale modification factors to produce a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (c) a discrete Fourier transform means responsive to the second signal to generate a transformed signal; and
  
  (d) an overlap-add means responsive to the transformed signal and the sequence of time-scale modification factors to generate a third signal representative of the synthetic speech waveform.
- View Dependent Claims (20)
- - 20. The apparatus of claim 19, wherein the speech information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

21. An apparatus for generating a synthetic speech waveform from a set of parameters representative of overlapping speech data frames stored in a memory means, a sequence of pitch-scale modification factors, and a sequence of time-scale modification factors,the speech information comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of a frequency response of a human vocal tract, and a sequence of fundamental frequency estimates,the apparatus comprising:
- (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets to generate a first signal representative of a sequence of time estimates relative to the center of each of the frames at which an excitation pulse occurs;
  
  (b) a phasor interpolator means electrically coupled to the memory means and the pitch onset time estimator means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of pitch-scale modification factors to generate a sequence of modified quasi-harmonic sinusoidal model parameter sets;
  
  (c) a discrete Fourier transform assignment means electrically coupled to the phasor interpolator means and the pitch onset time estimator means and responsive to the sequence of fundamental frequency estimates, the sequence of modified quasi-harmonic sinusoidal model parameter sets, the first signal, the sequence of time-scale modification factors, and the sequence of pitch-scale modification factors to produce a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (d) a discrete Fourier transform means responsive to the second signal to generate a transformed signal; and
  
  (e) an overlap-add means responsive to the transformed signal and the sequence of time-scale modification factors to generate a third signal representative of the synthetic speech waveform.
- View Dependent Claims (22)
- - 22. The apparatus of claim 21, wherein the speech information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

23. An apparatus for generating a synthetic musical waveform from a set of parameters representative of overlapping musical tone data frames stored in a memory means, a sequence of frequency scale modification factors, and a sequence of time-scale modification factors,the parameter set comprising a sequence of quasi-harmonic sinusoidal model parameter sets and a sequence of fundamental frequency estimates,the apparatus comprising:
- (a) a discrete Fourier transform assignment means electrically coupled to the memory means and responsive to the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the sequence of time-scale modification factors, and the sequence of frequency-scale modification factors to produce a first signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (b) a discrete Fourier transform means responsive to the first signal to generate a transformed signal; and
  
  (c) an overlap-add means responsive to the transformed signal and the sequence of time-scale modification factors to generate a second signal representative of the synthetic musical waveform.
- View Dependent Claims (24)
- - 24. The apparatus of claim 23 wherein the musical information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

25. An apparatus for generating a synthetic musical tone waveform from a set of parameters representative of overlapping frames of musical data stored in a memory means, a sequence of pitch-scale modification factors, and a sequence of time-scale modification factors,the musical information comprising a sequence of quasi-harmonic sinusoidal model parameter sets, a sequence of coefficient sets representative of estimates of a spectral envelope, and a sequence of fundamental frequency estimates,the apparatus comprising:
- (a) a pitch onset time estimator means electrically coupled to the memory means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, and the sequence of quasi-harmonic sinusoidal model parameter sets to generate a first signal representative of a sequence of time estimates relative to the center of each of the frames at which an excitation pulse occurs;
  
  (b) a phasor interpolator means electrically coupled to the memory means and the pitch onset time estimator means and responsive to the sequence of coefficient sets, the sequence of fundamental frequency estimates, the sequence of quasi-harmonic sinusoidal model parameter sets, the first signal, and the sequence of pitch-scale modification factors to generate a sequence of modified quasi-harmonic sinusoidal model parameter sets;
  
  (c) a discrete Fourier transform assignment means electrically coupled to the phasor interpolator means and responsive to the sequence of fundamental frequency estimates, the sequence of modified quasi-harmonic sinusoidal model parameter sets, the sequence of time-scale modification factors, and the sequence of pitch-scale modification factors to produce a second signal from which a modified synthetic contribution may be generated using a discrete Fourier transform algorithm;
  
  (d) a discrete Fourier transform means responsive to the second signal to generate a transformed signal; and
  
  (e) an overlap-add means responsive to the transformed signal and the sequence of time-scale modification factors to generate a third signal representative of the synthetic musical tone waveform.
- View Dependent Claims (26)
- - 26. The apparatus of claim 25, wherein the musical information further comprises an envelope stream representative of time-varying average magnitude, and the overlap-add means is electrically coupled to the memory means and is additionally responsive to the envelope stream.

27. An apparatus for analyzing a stream of digital signals representing input musical tones, the input musical tones having a time-varying average magnitude and a nominal expected pitch frequency represented by a nominal expected pitch frequency signal;
- the apparatus comprising;
  
  (a) a first frame segmenting means responsive to the stream of digital signals for segmenting the stream of digital signals into a sequence of overlapping musical tone data frames each having a fundamental frequency;
  
  (b) a time-varying average magnitude calculating means responsive to the stream of digital signals to produce an envelope stream representative of the time-varying changes in the average magnitude of the musical tones;
  
  (c) a second frame segmenting means responsive to the envelope stream for segmenting the envelope stream into a sequence of overlapping envelope data frames;
  
  (d) a first multiplying means responsive to the sequence of overlapping envelope data frames and the sequence of overlapping musical tone data frames for generating a sequence of first product frames;
  
  (e) a squaring means responsive to the sequence of overlapping envelope data frames for generating a sequence of squared envelope data frames;
  
  (f) a second multiplying means responsive to the sequence of first product frames for multiplying each one of the sequence of first product frames by an analysis window function to produce a sequence of second product frames;
  
  (g) a third multiplying means responsive to the sequence of squared envelope data frames for multiplying each one of the sequence of squared envelope data frames by an analysis window function to produce a corresponding sequence of third product frames;
  
  (h) a first discrete Fourier transform means responsive to the sequence of second product frames for calculating the discrete Fourier transform of each of the sequence of second product frames to produce a sequence of first transform frames;
  
  (i) a second discrete Fourier transform means responsive to the sequence of third product frames for calculating the discrete Fourier transform of each of the sequence of third product frames to produce a sequence of second transform frames;
  
  (j) a harmonically-constrained iterative analysis-by-synthesis means responsive to the sequence of first transform frames, the sequence of second transform frames, and the nominal expected pitch frequency signal for analyzing the sequence of first transform frames and the nominal expected pitch frequency signal to produce a sequence of fundamental frequency estimates, and to produce a sequence of quasi-harmonic sinusoidal parameter sets that approximate the sequence of musical tone data frames; and
  
  (k) a system estimator means responsive to the sequence of musical tone data frames for producing a sequence of coefficient sets representative of the spectral envelope of the quasi-harmonic sinusoidal model parameters.

28. An apparatus for analyzing a stream of digital signals representing input musical tones having a nominal expected pitch frequency represented by a nominal expected pitch frequency signal;
- the apparatus comprising;
  
  (a) a first frame segmenting means responsive to the stream of digital signals for segmenting the stream of digital signals into a sequence of overlapping musical tone data frames;
  
  (b) a multiplying means responsive to the sequence of overlapping musical tone data frames for multiplying each of the sequence of overlapping musical tone data frames with an analysis window function to produce a sequence of first product frames;
  
  (c) a discrete Fourier transform means responsive to the sequence of first product frames for calculating the discrete Fourier transform of each of the sequence of first product frames to produce a sequence of first transform frames;
  
  (d) a harmonically constrained iterative analysis-by-synthesis means responsive to the sequence of first transform frames, a discrete Fourier transform of the analysis window function, and the nominal expected pitch frequency signal for analyzing the sequences of first transform frames, the discrete Fourier transform of the analysis window function, and the nominal expected pitch frequency signal to produce a sequence of fundamental frequency estimates and a sequence of sets of sinusoidal parameters that approximate the corresponding sequence of musical tone data frames;
  
  (e) a system estimator means responsive to the sequence of musical tone data frames for producing a sequence of coefficient sets representative of the spectral envelope of the quasi-harmonic sinusoidal model parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Georgia Tech Research Corporation (University System of Georgia)
Original Assignee
Georgia Tech Research Corporation (University System of Georgia)
Inventors
George, E. Bryan, Smith, Mark J. T.
Primary Examiner(s)
Fleming, Michael R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/748,544
Time in Patent Office

1,048 Days
Field of Search

381/29-41, 381/46-53, 381/94, 395/2
US Class Current

704/211
CPC Class Codes

G10L 15/12   using dynamic programming t...

G10L 19/02   using spectral analysis, e....

G10L 21/02   Speech enhancement, e.g. no...

G10L 25/24   the extracted parameters be...

G10L 25/27   characterised by the analys...

Audio analysis/synthesis system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Audio analysis/synthesis system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links