Speech analysis-synthesis method and apparatus therefor

US 5,293,448 A
Filed: 09/03/1992
Issued: 03/08/1994
Est. Priority Date: 10/02/1989
Status: Expired due to Fees

First Claim

Patent Images

1. A speech analyzing apparatus comprising:

linear predictive analysis means for performing a linear predictive analysis of an input speech signal for each analysis window of a fixed length to obtain prediction coefficients, said linear predictive analysis means including means for determining whether said input speech signal in an analysis window of fixed length is voiced or unvoiced and for providing a voiced/unvoiced decision signal;

inverse filter means controlled by said prediction coefficients, for deriving a prediction residual from said input speech signal;

speech phase equalizing filter means for rendering the phase of said input speech signal into a zero phase to obtain a phase-equalized speech signal;

prediction residual phase equalizing filter means for rendering the phase of said prediction residual into a zero phase to obtain a phase-equalized prediction residual signal;

reference time point gathering means for detecting impulses of magnitudes larger than a predetermined threshold value in said phase-equalized prediction residual signal and for outputting the positions of said impulses as reference time points;

impulse position generating means responsive to said reference time points and said voiced/unvoiced decision signal for producing, based on said reference time points when said decision signal indicates that said speech signal is a voiced sound, differences between successive intervals of said reference time points for comparing the differences with a predetermined limit range, and for determining positions of impulses such that when the differences are within said predetermined limit range, said reference time points are determined as impulse positions, and when said difference are in excess of said predetermined limit range, impulse positions are determined by adding a time point to said reference time points or by omission of one of said reference time points or by shift of one of said reference time points so that the differences between the successive intervals of the processed reference time points are held within said limit range, said impulse positions thus determined being one of the parameters representing the excitation signal as a result of the speech analysis;

impulse sequence generating means for receiving said impulse positions from said impulse position generating means and generating impulses at said impulse positions;

all-pole filter means controlled by said prediction coefficients and excited by said generated impulse sequence to generate a synthesized speech; and

impulse magnitude calculating means for determining magnitude values of said impulses generated by said impulse sequence generating means which minimize an error between a waveform of a synthesized speech obtainable by exciting said all-pole filter means with said impulse sequence and a waveform of said phase-equalized speech supplied from said speech phase equalizing filter means, and means for outputting said impulse magnitudes for use as another one of the parameters representing the excitation signal as a result of the speech analysis by said speech analyzing apparatus.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An impulse sequence of a pitch frequency is detected from a phase-equalized prediction residual of an input speech signal, and a quasi-periodic impulse sequence is obtained by processing the impulse sequence so that a fluctuation in its pitch frequency is within an allowed limit range. The magnitudes of the quasi-periodic impulse sequence are so determined as to minimize an error between the waveform of a synthesized speech obtainable by exciting an all-pole filter with the quasi-periodic impulse sequence and the waveform of a phase-equalized speech obtainable by applying the input speech signal to a phase equalizing filter. Preferably, the quasi-periodic impulse sequence is supplied to the all-pole filter after being applied to a zero filter in which it is given features of the prediction residual of the speech. Coefficients of the zero filter are also determined so that the error of the waveforms of the synthesized speech and the phase-equalized speech is minimum.

Citations

7 Claims

1. A speech analyzing apparatus comprising:
- linear predictive analysis means for performing a linear predictive analysis of an input speech signal for each analysis window of a fixed length to obtain prediction coefficients, said linear predictive analysis means including means for determining whether said input speech signal in an analysis window of fixed length is voiced or unvoiced and for providing a voiced/unvoiced decision signal;
  
  inverse filter means controlled by said prediction coefficients, for deriving a prediction residual from said input speech signal;
  
  speech phase equalizing filter means for rendering the phase of said input speech signal into a zero phase to obtain a phase-equalized speech signal;
  
  prediction residual phase equalizing filter means for rendering the phase of said prediction residual into a zero phase to obtain a phase-equalized prediction residual signal;
  
  reference time point gathering means for detecting impulses of magnitudes larger than a predetermined threshold value in said phase-equalized prediction residual signal and for outputting the positions of said impulses as reference time points;
  
  impulse position generating means responsive to said reference time points and said voiced/unvoiced decision signal for producing, based on said reference time points when said decision signal indicates that said speech signal is a voiced sound, differences between successive intervals of said reference time points for comparing the differences with a predetermined limit range, and for determining positions of impulses such that when the differences are within said predetermined limit range, said reference time points are determined as impulse positions, and when said difference are in excess of said predetermined limit range, impulse positions are determined by adding a time point to said reference time points or by omission of one of said reference time points or by shift of one of said reference time points so that the differences between the successive intervals of the processed reference time points are held within said limit range, said impulse positions thus determined being one of the parameters representing the excitation signal as a result of the speech analysis;
  
  impulse sequence generating means for receiving said impulse positions from said impulse position generating means and generating impulses at said impulse positions;
  
  all-pole filter means controlled by said prediction coefficients and excited by said generated impulse sequence to generate a synthesized speech; and
  
  impulse magnitude calculating means for determining magnitude values of said impulses generated by said impulse sequence generating means which minimize an error between a waveform of a synthesized speech obtainable by exciting said all-pole filter means with said impulse sequence and a waveform of said phase-equalized speech supplied from said speech phase equalizing filter means, and means for outputting said impulse magnitudes for use as another one of the parameters representing the excitation signal as a result of the speech analysis by said speech analyzing apparatus.
- View Dependent Claims (2, 3, 4)
- - 2. The apparatus according to claim 1 further comprising:
    - zero filter means for providing said impulse sequence with features of the waveform of said phase-equalized prediction residual signal and supplying the output thereof to said all-pole filter means as the excitation signal; and
      
      zero filter coefficient calculating means for establishing the coefficients of said zero filter means which minimize an error between a waveform of a synthesized speech obtained by exciting said all-pole filter means with the output of said zero filter means and a waveform of said phase-equalized speech.
  - 3. The apparatus of claim 1 or 2, wherein said apparatus further includes random pattern generating means for generating a random pattern which minimizes an error between a waveform of a synthesized speech obtained by exciting said all-pole filter means with one of a plurality of predetermined random patterns and a waveform of said phase-equalized speech in a window during which said decision signal is unvoiced.
  - 4. The apparatus of claim 1 or 2, wherein said impulse sequence generating means includes vector quantizing mans for vector quantizing the magnitude values of said impulses determined by said impulse magnitude calculating means.

5. A method for analyzing a speech to generate parameters representing an input speech waveform including parameters of an excitation signal for exciting a linear filter representing a speech spectral envelope characteristic, comprising the steps of:
- producing a phase-equalized prediction residual of the input speech waveform;
  
  determining reference time points where levels of said phase-equalized prediction residual exceed a predetermined threshold;
  
  determining whether the input speech waveform in each of a plurality of successive analysis windows, each of which is of fixed time length, is voiced or unvoiced sound;
  
  obtaining the difference between intervals of successive ones of said reference time points in each analysis window;
  
  when the input speech waveform is voiced sound, selecting impulse positions based on said reference time points such that when the difference between the intervals of the successive reference time points in each analysis window is within a predetermined range, the reference time points are selected as impulse positions, and when the difference between the intervals of the successive reference time points exceeds the predetermined range, impulse positions are selected by moving or deleting the reference time points or inserting reference time points to define a sequence of quasi-periodic impulses so that the differences between successive reference time points are within said predetermined range the positions of said quasi-periodic impulse sequence being one of the parameters representing said excitation signal; and
  
  so selecting magnitudes of the respective impulses of the quasi-periodic sequence in each analysis window as to minimize an error between the phase-equalized speech waveform and a synthesized speech waveform obtained by exciting said linear filter with said quasi-periodic impulse sequence, the magnitudes of the quasi-periodic impulses being another of the parameters representing said excitation signal.
- View Dependent Claims (6, 7)
- - 6. The method of claim 5 wherein, before being applied to said linear filter, said quasi-periodic impulses are processed by a zero filter, said method including the step of selecting coefficients of said zero filter which minimize an error between said phase-equalized speech waveform and a synthesized speech waveform obtained by exciting said linear filter with the output of said zero filter, whereby said processing of said quasi-periodic impulses by said zero filter gives the sequence of said quasi-periodic impulses features of the waveform of said phase-equalized prediction residual signal, and using said coefficients of said zero filter as one of said parameters representing said excitation signal.
  - 7. The method of claim 5 or 6 wherein said excitation signal is used for a voiced sound and a random sequence selected from a plurality of predetermined random patterns is used as an excitation signal for an unvoiced sound, said method including so selecting one of said predetermined random patterns representing said excitation signal for said unvoiced sound as to minimize an error between said phase-equalized speech waveform nd a synthesized speech waveform obtainable by exciting said linear filter with said random patterns, and using said selected one of the predetermined random patterns to produce one of the parameters representing the input speech waveform.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nippon Telegraph and Telephone Corporation
Original Assignee
Nippon Telegraph and Telephone Corporation
Inventors
Honda, Masaaki
Primary Examiner(s)
Knepper, David D.

Application Number

US07/939,049
Time in Patent Office

551 Days
Field of Search

395/2, 381/29-40
US Class Current

704/208
CPC Class Codes

G10L 19/08 Determination or coding of ...

Speech analysis-synthesis method and apparatus therefor

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Speech analysis-synthesis method and apparatus therefor

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links