Speech analysis-synthesis method and apparatus therefor
First Claim
1. A speech analyzing apparatus comprising:
- linear predictive analysis means for performing a linear predictive analysis of an input speech signal for each analysis window of a fixed length to obtain prediction coefficients, said linear predictive analysis means including means for determining whether said input speech signal in an analysis window of fixed length is voiced or unvoiced and for providing a voiced/unvoiced decision signal;
inverse filter means controlled by said prediction coefficients, for deriving a prediction residual from said input speech signal;
speech phase equalizing filter means for rendering the phase of said input speech signal into a zero phase to obtain a phase-equalized speech signal;
prediction residual phase equalizing filter means for rendering the phase of said prediction residual into a zero phase to obtain a phase-equalized prediction residual signal;
reference time point gathering means for detecting impulses of magnitudes larger than a predetermined threshold value in said phase-equalized prediction residual signal and for outputting the positions of said impulses as reference time points;
impulse position generating means responsive to said reference time points and said voiced/unvoiced decision signal for producing, based on said reference time points when said decision signal indicates that said speech signal is a voiced sound, differences between successive intervals of said reference time points for comparing the differences with a predetermined limit range, and for determining positions of impulses such that when the differences are within said predetermined limit range, said reference time points are determined as impulse positions, and when said difference are in excess of said predetermined limit range, impulse positions are determined by adding a time point to said reference time points or by omission of one of said reference time points or by shift of one of said reference time points so that the differences between the successive intervals of the processed reference time points are held within said limit range, said impulse positions thus determined being one of the parameters representing the excitation signal as a result of the speech analysis;
impulse sequence generating means for receiving said impulse positions from said impulse position generating means and generating impulses at said impulse positions;
all-pole filter means controlled by said prediction coefficients and excited by said generated impulse sequence to generate a synthesized speech; and
impulse magnitude calculating means for determining magnitude values of said impulses generated by said impulse sequence generating means which minimize an error between a waveform of a synthesized speech obtainable by exciting said all-pole filter means with said impulse sequence and a waveform of said phase-equalized speech supplied from said speech phase equalizing filter means, and means for outputting said impulse magnitudes for use as another one of the parameters representing the excitation signal as a result of the speech analysis by said speech analyzing apparatus.
0 Assignments
0 Petitions
Accused Products
Abstract
An impulse sequence of a pitch frequency is detected from a phase-equalized prediction residual of an input speech signal, and a quasi-periodic impulse sequence is obtained by processing the impulse sequence so that a fluctuation in its pitch frequency is within an allowed limit range. The magnitudes of the quasi-periodic impulse sequence are so determined as to minimize an error between the waveform of a synthesized speech obtainable by exciting an all-pole filter with the quasi-periodic impulse sequence and the waveform of a phase-equalized speech obtainable by applying the input speech signal to a phase equalizing filter. Preferably, the quasi-periodic impulse sequence is supplied to the all-pole filter after being applied to a zero filter in which it is given features of the prediction residual of the speech. Coefficients of the zero filter are also determined so that the error of the waveforms of the synthesized speech and the phase-equalized speech is minimum.
-
Citations
7 Claims
-
1. A speech analyzing apparatus comprising:
-
linear predictive analysis means for performing a linear predictive analysis of an input speech signal for each analysis window of a fixed length to obtain prediction coefficients, said linear predictive analysis means including means for determining whether said input speech signal in an analysis window of fixed length is voiced or unvoiced and for providing a voiced/unvoiced decision signal; inverse filter means controlled by said prediction coefficients, for deriving a prediction residual from said input speech signal; speech phase equalizing filter means for rendering the phase of said input speech signal into a zero phase to obtain a phase-equalized speech signal; prediction residual phase equalizing filter means for rendering the phase of said prediction residual into a zero phase to obtain a phase-equalized prediction residual signal; reference time point gathering means for detecting impulses of magnitudes larger than a predetermined threshold value in said phase-equalized prediction residual signal and for outputting the positions of said impulses as reference time points; impulse position generating means responsive to said reference time points and said voiced/unvoiced decision signal for producing, based on said reference time points when said decision signal indicates that said speech signal is a voiced sound, differences between successive intervals of said reference time points for comparing the differences with a predetermined limit range, and for determining positions of impulses such that when the differences are within said predetermined limit range, said reference time points are determined as impulse positions, and when said difference are in excess of said predetermined limit range, impulse positions are determined by adding a time point to said reference time points or by omission of one of said reference time points or by shift of one of said reference time points so that the differences between the successive intervals of the processed reference time points are held within said limit range, said impulse positions thus determined being one of the parameters representing the excitation signal as a result of the speech analysis; impulse sequence generating means for receiving said impulse positions from said impulse position generating means and generating impulses at said impulse positions; all-pole filter means controlled by said prediction coefficients and excited by said generated impulse sequence to generate a synthesized speech; and impulse magnitude calculating means for determining magnitude values of said impulses generated by said impulse sequence generating means which minimize an error between a waveform of a synthesized speech obtainable by exciting said all-pole filter means with said impulse sequence and a waveform of said phase-equalized speech supplied from said speech phase equalizing filter means, and means for outputting said impulse magnitudes for use as another one of the parameters representing the excitation signal as a result of the speech analysis by said speech analyzing apparatus. - View Dependent Claims (2, 3, 4)
-
-
5. A method for analyzing a speech to generate parameters representing an input speech waveform including parameters of an excitation signal for exciting a linear filter representing a speech spectral envelope characteristic, comprising the steps of:
-
producing a phase-equalized prediction residual of the input speech waveform; determining reference time points where levels of said phase-equalized prediction residual exceed a predetermined threshold; determining whether the input speech waveform in each of a plurality of successive analysis windows, each of which is of fixed time length, is voiced or unvoiced sound; obtaining the difference between intervals of successive ones of said reference time points in each analysis window; when the input speech waveform is voiced sound, selecting impulse positions based on said reference time points such that when the difference between the intervals of the successive reference time points in each analysis window is within a predetermined range, the reference time points are selected as impulse positions, and when the difference between the intervals of the successive reference time points exceeds the predetermined range, impulse positions are selected by moving or deleting the reference time points or inserting reference time points to define a sequence of quasi-periodic impulses so that the differences between successive reference time points are within said predetermined range the positions of said quasi-periodic impulse sequence being one of the parameters representing said excitation signal; and so selecting magnitudes of the respective impulses of the quasi-periodic sequence in each analysis window as to minimize an error between the phase-equalized speech waveform and a synthesized speech waveform obtained by exciting said linear filter with said quasi-periodic impulse sequence, the magnitudes of the quasi-periodic impulses being another of the parameters representing said excitation signal. - View Dependent Claims (6, 7)
-
Specification