Digital speech sinusoidal vocoder with transmission of only subset of harmonics

US 4,771,465 A
Filed: 09/11/1986
Issued: 09/13/1988
Est. Priority Date: 09/11/1986
Status: Expired due to Term

First Claim

Patent Images

1. A processing system for synthesizing voice from encoded information representing speech frames each having a predetermined number of evenly spaced samples of instantaneous amplitude of speech with said encoded information for each frame representing frame energy and a set of speech parameters and a fundamental frequency signal of the speech and offset signals representing the difference between the theoretical harmonic frequencies as derived from a fundamental frequency signal and a subset of the actual harmonic frequencies, said system comprising:

means responsive to the offset signals and the fundamental frequency signal of one of said frames for calculating a subset of harmonic phase signals corresponding to said offset signals;

means responsive to said fundamental frequency signal for computing the remaining harmonic phase signals for said one of said frames;

means responsive to the frame energy and the set of speech parameters of said one of said frames for determining the amplitudes of said fundamental signal and said subset of said harmonic phase signals and said remaining harmonic phase signals; and

means for generating replicated speech in response to said fundamental signal and said subset of said harmonic phase signals and said remaining harmonic phase signals and the determined amplitudes for said one of said frames.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech analyzer and synthesizer system using a sinusoidal encoding and decoding technique for voiced frames and noise excitation or multipulse excitation for unvoiced frames. For voiced frames, the analyzer transmits the pitch, values for a subset of offsets defining differences between harmonic frequencies and a fundamental frequency, total frame energy, and linear predictive coding, LPC, coefficients. The synthesizer is responsive to that information to determine the harmonic frequencies from the offset information for a subset of the harmonics and to determine the remaining harmonics from the fundamental frequency. The synthesizer then determines the phase for the fundamental frequency and harmonic frequencies and determines the amplitudes of the fundamental and harmonics using the total frame energy and the LPC coefficients. Once the phase and amplitudes have been determined for the fundamental and harmonic frequencies, the synthesizer performs a sinusoidal analysis. In another embodiment, the remaining harmonic frequencies are determined by calculating the theoretical harmonic frequencies for the remaining harmonic frequencies and grouping these theoretical frequencies into groups having the same number as the number of offsets transmitted. The offsets are then added to the corresponding theoretical harmonics of each of the groups of the remaining harmonic frequencies to generate the remaining harmonic frequencies. In a third embodiment, the offset signals are randomly permuted before being added to the groups of theoretical frequencies to generate the remaining harmonic frequencies.

Citations

24 Claims

1. A processing system for synthesizing voice from encoded information representing speech frames each having a predetermined number of evenly spaced samples of instantaneous amplitude of speech with said encoded information for each frame representing frame energy and a set of speech parameters and a fundamental frequency signal of the speech and offset signals representing the difference between the theoretical harmonic frequencies as derived from a fundamental frequency signal and a subset of the actual harmonic frequencies, said system comprising:
- means responsive to the offset signals and the fundamental frequency signal of one of said frames for calculating a subset of harmonic phase signals corresponding to said offset signals;
  
  means responsive to said fundamental frequency signal for computing the remaining harmonic phase signals for said one of said frames;
  
  means responsive to the frame energy and the set of speech parameters of said one of said frames for determining the amplitudes of said fundamental signal and said subset of said harmonic phase signals and said remaining harmonic phase signals; and
  
  means for generating replicated speech in response to said fundamental signal and said subset of said harmonic phase signals and said remaining harmonic phase signals and the determined amplitudes for said one of said frames.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The system of claim 1 wherein said computing means comprises means for multiplying each harmonic number with said fundamental frequency signal to generate a frequency for each of said remaining harmonic phase signals;
    - means for arithmetically varying the generated frequencies; and
      
      means responsive to the varied frequencies for calculating said remaining harmonic phase signals.
  - 3. The system of claim 2 wherein said varying means comprises means for constraining an arithmetic signal generated by subtracting a variable signal multiplied by a first constant from the harmonic number multiplied by said fundamental frequency signal such that said arithmetic signal is less than a second constant;
    - andmeans for subtracting said variable signal multiplied by said first constant from said harmonic number multiplied times said fundamental frequency signal for each of said remaining harmonic phase signals to generate said varied frequencies.
  - 4. The system of claim 1 wherein said computing means comprises means for generating the remaining harmonic frequency signals corresponding to said remaining harmonic phase signals by multiplying said fundamental frequency signal by the harmonic number for each of said remaining harmonic phase signals;
    - means for grouping the multiplied frequency signals into a plurality of subsets, each having the same number of harmonics as said subset of harmonic phase signals; and
      
      means for adding each of said offset signals to the corresponding grouped frequency signals of each of said plurality of subsets to generate varied remaining harmonic frequency signals; and
      
      means for calculating said remaining harmonic phase signals from said varied harmonic frequency signals.
  - 5. The system of claim 1 wherein said computing means comprises means for generating the remaining harmonic frequency signals corresponding to said harmonic phase signals by multiplying said fundamental signal by the harmonic number for each of said remaining harmonic phase signals;
    - means for grouping the multiplied frequency signals into a plurality of subsets, each having the same number of harmonics as said subset of harmonic phase signals;
      
      means for permuting the order of said offset signals;
      
      means for adding each of said permuted offset signals to the corresponding grouped frequency signal of each of said plurality of subsets to generate varied remaining harmonic frequency signals; and
      
      means for calculating said remaining harmonic phase signals from the varied remaining harmonic frequency signals.
  - 6. The system of claim 1 wherein said determining means comprisesmeans for calculating the unscaled energy of each of said harmonic phase signals from said set of speech parameters for said one of said frames;
    - means for summing said unscaled energy for all of said harmonic phase signals for said one of said frames; and
      
      means responsive to said harmonic energy of each of said harmonic signals and the summed unscaled energy and said frame energy for said one of said frames for computing the amplitudes of said harmonic phase signals.
  - 7. The system of claim 1 wherein each of said harmonic phase signals comprises a plurality of samples and said calculating means comprises means for adding each of said offset signals to said fundamental signal to obtain the corresponding harmonic sample for each harmonic phase signals of said subset;
    - said computing means comprises means for generating a corresponding harmonic sample for each of said remaining harmonic phase signals; and
      
      means responsive to the corresponding harmonic sample for said one of said frames and the corresponding harmonic samples for the previous and subsequent ones of said frames for each of said harmonic phase signals for interpolating to obtain said plurality of harmonic samples for each of said harmonic phase signals for said one of said frames upon said previous and subsequent ones of said frames being voiced frames.
  - 8. The system of claim 7 wherein the interpolating means performs a linear interpolation.
  - 9. The system of claim 8 wherein said corresponding harmonic signal for said one of said frames for each of said harmonic phase signals is located in the center of said one of said frames.
  - 10. The system of claim 9 wherein said interpolating means comprises a first means for setting a subset of said plurality of harmonic samples for each of said harmonic phase signals from each of said corresponding harmonic samples to the beginning of said frames equal to each of said corresponding harmonic samples upon said previous one of said frames being an unvoiced frame;
    - anda second means for setting another subset of said plurality of harmonic samples for each of said harmonic phase signals from each of said corresponding harmonic samples to the end of said one of said frames equal to said corresponding harmonic sample for each of said harmonic phase signals upon said sequential one of said frames being an unvoiced frame.
  - 11. The system of claim 10 each of said frames further encoded by a set of speech parameters and multipulse excitation information and a excitation type signal upon said one of said frames being unvoiced and said system further comprises;
    - means for synthesizing said one of said frames of speech utilizing said set of speech parameter signals and said noise-like excitation upon said excitation type signal indicating noise excitation; and
      
      said synthesizing means further responsive to said speech parameter signals and said multipulse excitation information to synthesize said one of said frames of speech utilizing said multipulse excitation information and said set of speech parameter signals upon said excitation type signal indicating multipulse.
  - 12. The system of claim 11 wherein said synthesizing means further comprises means responsive to said set of parameter signals from said previous frames to initialize said synthesizing means upon said one of said frames being the first unvoiced frame of an unvoiced region.
  - 13. The system of claim 12 wherein said generating means performs a sinusoidal synthesis to produce the replicated speech utilizing said harmonic phase signals and said determined amplitudes for said one of said frames.

14. A processing system for encoding human speech comprising:
- means for segmenting the speech into a plurality of speech frames, each having a predetermined number of evenly spaced samples of instantaneous amplitudes of speech and each of which overlaps by a predefined number of samples with the previous and subsequent frames;
  
  means for calculating a set of speech parameter signals defining a vocal tract for each frame;
  
  means for calculating the frame energy per frame of the speech samples;
  
  means for performing a spectral analysis of said speech samples of each frame to produce a spectrum for each frame;
  
  means for detecting the fundamental frequency signal for each frame from the spectrum corresponding to each frame;
  
  means for determining a subset of harmonic frequency signals for each frame from the spectrum corresponding to each frame;
  
  means for determining offset signals representing the difference between each of said harmonic frequency signals and multiples of said fundamental frequency signal; and
  
  means for transmitting encoded representations of said frame energy and said set of speech parameters and said fundamental frequency signal and said offset signals for subsequent speech synthesis.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The system of claim 14 wherein said performing means comprises means for downsampling said speech samples thereby reducing the amount of computation.
  - 16. The system of claim 15 further comprises means for designating frames as voiced and unvoiced;
    - means for transmitting a signal to indicate the use of noise-like excitation upon speech of said one of said frames resulting from noise-like source in the human larynx and said designating means indicating an unvoiced frame;
      
      means for forming excitation information from a multipulse excitation source upon the absence of the noise-like source and upon said designating means indicating an unvoiced frame; and
      
      said transmitting means further responsive to said multipulse excitation information and said set of speech parameters for transmitting encoded representations of multipulse excitation information and said set of speech parameters for subsequent speech synthesis.
  - 17. The system of claim 14 wherein said detecting means comprises means for identifying the peak corresponding to said fundamental frequency signal;
    - andmeans for performing a second order interpolation around said peak to more accurately detect said fundamental frequency signal.
  - 18. The system of claim 14 wherein said determining means comprises means for identifying the peaks each corresponding to one of said harmonic frequency signals;
    - andmeans for performing a second order interpolation around each of said peaks to more accurately determine each of the corresponding harmonic frequency signals.

19. A method for synthesizing voice from encoded information representing speech frames each having a predetermined number of evenly spaced samples of instantaneous amplitude of speech with said encoded information for each frame comprising frame energy and a set of speech parameters and a fundamental frequency of speech and offset signals representing the difference between the theoretical harmonic frequencies as derived from a fundamental frequency signals and a subset of actual harmonic frequencies, comprising the steps of:
- calculating a subset of harmonic phase signals corresponding to said offset signals;
  
  computing the remaining harmonic phase signals for said one of said frames from said fundamental frequency signal;
  
  determining the amplitudes of said fundamental signal and said subset of harmonic phase signals and said remaining harmonic phase signals from the frame energy and the set of speech parameters of said one of said frame; and
  
  generating replicated speech in response to said fundamental signal and said subset and remaining harmonic phase signals and said determined amplitudes for said one of said frames.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. The method of claim 19 wherein said computing step comprises the steps of multiplying each harmonic number with said fundamental frequency signal to generate a frequency for each of said remaining harmonic phase signals;
    - arithmetically varying the generated frequencies; and
      
      calculating said remaining phase signals from said varied frequencies.
  - 21. The method of claim 19 wherein said computing step comprises the step of generating the remaining harmonic frequency signals corresponding to said remaining harmonic phase signals by multiplying said fundamental frequency signal by the harmonic number for each of said remaining harmonic signals;
    - grouping the multiplied frequency signals into a plurality of subsets, each having the same number of harmonics as said subset of harmonic phase signal;
      
      adding each of said offset signals to the corresponding grouped frequency signals of each of said plurality of subsets to generate varied remaining harmonic frequency signals; and
      
      calculating said remaining harmonic phase signals from said varied harmonic frequency signals.
  - 22. The method of claim 21 wherein said step of adding comprises the step of permuting the order of said offset signals before adding said signals to said corresponding grouped frequency signals of each of said plurality of subsets to generate said varied remaining harmonic frequency signals.
  - 23. The method of claim 19 wherein said determining step comprises the steps of calculating the unscaled energy of each of said harmonic phase signals from said set of speech parameters for said one of said frames;
    - summing said unscaled energy for all of said harmonic phase signals for said one of said frames; and
      
      computing the amplitudes of said harmonic phase signals in response to said harmonic energy of each of said harmonic signals and the summed unscaled energy and said frame energy for said one of said frames.
  - 24. The method of claim 19 wherein each of said frames further encoded by a set of speech parameters and multipulse excitation information and an excitation type signal upon said one of said frames being unvoiced, and said method further comprising the steps of synthesizing said one of said frames of speech utilizing said set of speech parameter signals and noise like excitation upon said excitation type signal indicating noise excitation;
    - andfurther synthesizing in response to said speech parameter signals and said multipulse excitation information to synthesize said one of said frames of speech using said multipulse excitation information and said set of speech parameter signals upon said excitation type signal indicating multipulse.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
American Telephone & Telegraph Company (AT&T, Inc.), Bell Telephone Laboratories, Inc. (Nokia Corporation)
Original Assignee
American Telephone & Telegraph Company (AT&T, Inc.)
Inventors
Jacobs, Thomas E., Kleijn, Willem B., Ketchum, Richard H., Bronson, Edward C., Hartwell, Walter T.
Primary Examiner(s)
Shoop, Jr., William M.
Assistant Examiner(s)
YOUNG, BRIAN K

Application Number

US06/906,424
Time in Patent Office

733 Days
Field of Search

381/36-41, 381/53, 364/724
US Class Current

704/207
CPC Class Codes

G10L 19/02   using spectral analysis, e....

G10L 19/06   Determination or coding of ...

G10L 19/07   Line spectrum pair [LSP] vo...

Digital speech sinusoidal vocoder with transmission of only subset of harmonics

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Digital speech sinusoidal vocoder with transmission of only subset of harmonics

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links