SPEECH ANALYSIS AND SYNTHESIS BY THE USE OF THE LINEAR PREDICTION OF A SPEECH WAVE
First Claim
1. Speech analysis apparatus, which comprises:
- means for developing a first set of signals which specify linearly predictable characteristics of an applied speech signal, means for developing a second set of signals representative of the duration of individual pitch periods of said applied speech signal, means for developing a third set of signals representative of the energy of a speech signal and of the voicing character of speech signals within each of said pitch periods, and means for utilizing all of said developed signals together as a representation of said applied speech signal.
0 Assignments
0 Petitions
Accused Products
Abstract
A short-time spectral analysis of a nonstationary signal, such as a speech signal, does not ordinarily yield control signal information sufficient for subsequent synthesis. However, more reliable control signals for a speech synthesizer can be obtained by making use of natural constraints, applicable to a speech wave, in the analysis procedure. For frequencies below 5 kHz., the human vocal tract can be modeled as an acoustic tube in which only plane waves propagate. Thus, for vowels and vowellike sounds, the speech output of the vocal tract at any instant of time can be assumed to be a weighted sum of its past values and the input to the vocal tract at that instant of time. In the described invention, a speech wave is represented by the output of a linear filter which simulates an acoustic tube and which is excited by a combination of a quasi-periodic pulse train and white noise. The parameters of this filter are derived from the speech wave such that the mean-squared error between the synthetic speech samples at the output of the filter and the input speech samples is minimum.
-
Citations
10 Claims
-
1. Speech analysis apparatus, which comprises:
- means for developing a first set of signals which specify linearly predictable characteristics of an applied speech signal, means for developing a second set of signals representative of the duration of individual pitch periods of said applied speech signal, means for developing a third set of signals representative of the energy of a speech signal and of the voicing character of speech signals within each of said pitch periods, and means for utilizing all of said developed signals together as a representation of said applied speech signal.
-
2. Speech signal analysis apparatus as defined in claim 1, wherein, said first set of signals which specify linearly predictable characteristics comprises a plurality of limited channel capacity parameter signals derived from past and current values of said applied speech signal for adjusting a resonant filter system, arranged to produce a replica of said applied speech signal when excited by voiced and unvoiced excitation signals.
-
3. Speech signal analysis apparatus, as defined in claim 1, wherein, said first set of signals comprises a sequence of signals a a1, ..., an, for each pitch period of said applied signals, which uniquely determine the frequencies and bandwidths of formants of said applied signal below approximately 5 kHz.
-
4. Speech signal analysis apparatus as defined in claim 3, in combination with, means supplied with said sequence of signals a for developing signals representative of the frequencies and bandwidths of formants of said applied speech signal during selected pitch periods.
-
5. Speech signal analysis apparatus as defined in claim 1, wherein, said first set of signals is developed by minimizing the mean-squared error between the actual values of samples of said applied speech signal aNd predicted values thereof based on a selected number of past sample values.
-
6. Speech signal apparatus, which comprises:
- at a transmitter station;
means for developing a first set of signals which specify linearly predictable characteristics of an applied speech signal, means for developing a second set of signals representative of the duration of individual pitch periods of said applied speech signal, means for developing a third set of signals representative of the energy of a speech signal in each of said pitch periods and of the voicing character of speech signals within said pitch periods, and means for combining all of said developed signals for transmission to a receiver station; and
at said receiver station;
means responsive to received signals of said first set for developing signals representative of predicted values of a speech signal, means responsive to received signals of said second set for developing a sequence of pitch period pulses, means for generating white noise signals, means responsive to received signals of said third set for individually adjusting the levels of said pitch period pulses and said white noise signals, and means for combining said adjusted pitch period pulses, said adjusted white noise signals, and said predicted value signals to form speech signal which is a replica of said applied speech signal.
- at a transmitter station;
-
7. Speech signal apparatus as defined in claim 6, wherein, said means at said receiver station for developing signals representative of predicted values of said speech signal comprises, a transversal filter supplied with a combination of adjusted pitch period pulses, adjusted noise signals, and signals selectively representative of past values of said applied signal.
-
8. Synthesis apparatus for developing artificial speech from signals representative of the pitch period, voicing character, and selected predictable characteristics of an applied speech signal, which comprises:
- means responsive to received signals representative of selected predictable characteristics of an applied speech signal for developing signals representative of selected predicted values of said speech signal, means responsive to received signals representative of the pitch period of said applied speech signal for developing a sequence of pitch period pulses, means for generating white noise signals, means responsive to received signals representative of the voicing character of said applied speech signal for individually adjusting the levels of said pitch period pulses and said white noise signals, and means for combining said adjusted pitch period pulses, said adjusted white noise signals, and said predicted value signals to form speech signal which is a replica of said applied speech signal.
-
9. Synthesis apparatus as defined in claim 8, wherein said means for developing signals representative of predicted values of said speech signal comprises a transversal filter supplied with said combined replica signal and adjusted by said predictable characteristic signals.
-
10. Synthesis apparatus as defined in claim 8, wherein, said predicted value signals are selected to represent a linear combination of preceding values of said replica of said applied speech signal.
Specification