Method and apparatus for speech excitation waveform coding using multiple error waveforms

US 5,809,459 A
Filed: 05/21/1996
Issued: 09/15/1998
Est. Priority Date: 05/21/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method for encoding speech comprising the steps of:

a) generating an excitation waveform by performing a linear prediction coding (LPC) analysis on a number of samples of input speech and inverse filtering the samples of input speech;

b) selecting a source segment of the excitation waveform;

c) computing a target segment as a representative portion of the excitation waveform, wherein the target segment represents a fundamental period of the excitation waveform;

d) computing orthogonal error waveforms by computing at least one model, at least one model reference, and comparing the at least one model reference and the at least one model;

e) encoding the orthogonal error waveforms and parameters describing the input speech; and

f) creating a bitstream which includes encoded versions of the orthogonal error waveforms and the parameters.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus (100) for pitch-epoch-synchronous source-filter speech encoding by means of error component modeling methods (310) which capture fundamental orthogonal (uncorrelated) basis elements of an excitation source waveform. A periodic waveform model (318) along with four orthogonal error waveforms, desirably including phase error (319), ensemble error (321), standard deviation error (323), and mean error (324) waveforms, are incorporated together to form a complete description of the excitation. These error waveforms (319,321, 323, 324) represent those portions of the excitation that are not represented by the purely periodic model. By thus orthogonalizing the error components, the perceptual effect of each element is isolated from the composite set, and can thus be encoded separately. In addition to high-quality, fixed-rate operation, the identity-system capability and low complexity of the speech encoding method and apparatus make them applicable to variable-rate applications without changing underlying modeling methods.

Citations

64 Claims

1. A method for encoding speech comprising the steps of:
- a) generating an excitation waveform by performing a linear prediction coding (LPC) analysis on a number of samples of input speech and inverse filtering the samples of input speech;
  
  b) selecting a source segment of the excitation waveform;
  
  c) computing a target segment as a representative portion of the excitation waveform, wherein the target segment represents a fundamental period of the excitation waveform;
  
  d) computing orthogonal error waveforms by computing at least one model, at least one model reference, and comparing the at least one model reference and the at least one model;
  
  e) encoding the orthogonal error waveforms and parameters describing the input speech; and
  
  f) creating a bitstream which includes encoded versions of the orthogonal error waveforms and the parameters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 2. The method as claimed in claim 1, wherein step b) comprises the step of:
    - b1) selecting the source segment as a prior target segment.
  - 3. The method as claimed in claim 1, wherein step c) comprises the steps of:
    - c1) selecting a first target as a first portion of the excitation waveform;
      
      c2) correlating the first target with a previous target which is an adjacent portion of the excitation waveform, resulting in an array of correlation coefficients;
      
      c3) aligning the first target segment with the previous target segment by shifting the first target segment by a lag corresponding to a maximum correlation coefficient of the array;
      
      c4) repeating steps c2) and c3) until all targets of the excitation waveform have been correlated and aligned, resulting in an aligned excitation waveform; and
      
      c5) computing the representative portion by performing an ensemble process on the aligned excitation waveform.
  - 4. The method as claimed in claim 3, further comprising the step of:
    - c6) normalizing the first target segment and the previous target segment to a uniform length prior to the correlating step.
  - 5. The method as claimed in claim 3, wherein the step c5) comprises the step of:
    - c5a) performing the ensemble process wherein the ensemble process is an ensemble mean.
  - 6. The method as claimed in claim 3, wherein the step c5) comprises the step of:
    - c5a) performing the ensemble process wherein the ensemble process is a synchronous matched filter average.
  - 7. The method as claimed in claim 3, wherein the step c5) comprises the step of:
    - c5a) performing the ensemble process wherein the ensemble process is an ensemble filter.
  - 8. The method as claimed in claim 1, wherein a first error waveform of the orthogonal error waveforms is an ensemble error waveform and step d) comprises the steps of:
    - d1) computing a periodic excitation model of the excitation waveform by ensemble interpolating between the target segment and the source segment;
      
      d2) creating a reference excitation waveform; and
      
      d3) computing the ensemble error waveform by computing errors between the reference excitation waveform and the periodic excitation model.
  - 9. The method as claimed in claim 8, wherein step d1) comprises the steps of:
    - d1a) energy normalizing the target segment and the source segment;
      
      d1b) correlating the target segment with the source segment, resulting in an array of correlation coefficients;
      
      d1c) aligning the target segment with the source segment by shifting the target segment by a lag corresponding to a maximum correlation coefficient of the array; and
      
      d1d) ensemble interpolating between an aligned version of the target segment and the source segment, resulting in a sequence of interpolated segments.
  - 10. The method as claimed in claim 9, wherein step d2) comprises the steps of:
    - d2a) computing a mean of an epoch of the excitation waveform;
      
      d2b) computing a standard deviation of the epoch of the excitation waveform;
      
      d2c) energy normalizing the epoch by subtracting the mean and dividing by the standard deviation for the epoch; and
      
      d2d) repeating steps d2a) through d2c) until all epochs have been energy normalized.
  - 11. The method as claimed in claim 10, further comprising the steps of:
    - d2e) aligning an energy normalized epoch with a second epoch of the sequence of interpolated segments; and
      
      d2f) repeating step d2e) until all energy normalized epochs have been aligned, producing the reference excitation waveform.
  - 12. The method as claimed in claim 11, wherein step d2c) comprises the steps of:
    - d2e1) correlating the energy normalized epoch with the second epoch, producing a maximum correlation offset index; and
      
      d2e2) cyclically shifting the energy normalized epoch by the maximum correlation offset index, producing a shifted normalized epoch.
  - 13. The method as claimed in claim 9, further comprising the step of:
    - d1e) pitch normalizing the target segment to a uniform normalizing length by upsampling the target segment when a target segment length is less than the uniform normalizing length and downsampling the target segment when the target segment length is more than the uniform normalizing length, wherein the correlating step is performed on the target segment after pitch normalizing.
  - 14. The method as claimed in claim 9, wherein step d3) comprises the steps of:
    - d3a) computing a phase error between a reference segment of the reference excitation waveform relative to an interpolated segment, producing an integer offset value;
      
      d3b) shifting the reference segment by the integer offset value, producing an aligned excitation reference;
      
      d3c) subtracting the aligned excitation reference from the interpolated segment, producing a segment representing an ensemble error; and
      
      d3d) repeating steps d3a) through d3c) for each of interpolated segments and reference segments.
  - 15. The method as claimed in claim 14, wherein step d3a) comprises the step of:
    - d3a1) correlating the reference segment with the interpolated segment.
  - 16. The method as claimed in claim 1, wherein a second error waveform of the orthogonal error waveforms is a standard deviation error waveform, and step d) comprises the steps of:
    - d1) creating a standard deviation reference waveform from the excitation waveform;
      
      d2) computing a standard deviation model from standard deviation values derived from the target segment and a source segment; and
      
      d3) computing the standard deviation error waveform by computing the error between the standard deviation reference waveform and the standard deviation model.
  - 17. The method as claimed in claim 16, wherein step d2) comprises the steps of:
    - d2a) creating the standard deviation model by interpolating between a standard deviation of the source segment and the standard deviation of the target segment.
  - 18. The method as claimed in claim 1, wherein a third error waveform of the orthogonal error waveforms is a mean error waveform, and step d) comprises the steps of:
    - d1) creating a mean reference waveform from the excitation waveform;
      
      d2) computing a mean model from mean values derived from the target segment and the source segment; and
      
      d3) computing the mean error waveform by computing an error between the mean reference waveform and the mean model.
  - 19. The method as claimed in claim 18, wherein step d2) comprises the steps of:
    - d2a) creating the mean model by interpolating between a mean of the source segment and a mean of the target segment.
  - 20. The method as claimed in claim 1, wherein a fourth error waveform of the orthogonal error waveforms is a phase error waveform, and step d) comprises the steps of:
    - d1) computing a periodic excitation model of the excitation waveform by ensemble interpolating between the target segment and the source segment;
      
      d2) creating a second excitation waveform by pitch normalizing the excitation waveform; and
      
      d3) phase normalizing the second excitation waveform, resulting in a reference excitation waveform and a phase error waveform.
  - 21. The method as claimed in claim 20, wherein step d1) comprises the steps of:
    - d1a) energy normalizing the target segment and the source segment;
      
      d1b) correlating the target segment with the source segment, resulting in an array of correlation coefficients;
      
      d1c) aligning the target segment with the source segment by shifting the target segment by a lag corresponding to a maximum correlation coefficient of the array; and
      
      d1d) ensemble interpolating between an aligned version of the target segment and the source segment, resulting in a sequence of interpolated segments.
  - 22. The method as claimed in claim 21, wherein step d3) comprises the steps of:
    - d3a) computing a mean of an epoch of the second excitation waveform;
      
      d3b) computing a standard deviation of the epoch of the second excitation waveform;
      
      d3c) energy normalizing the epoch by subtracting the mean and dividing by the standard deviation for the epoch; and
      
      d3d) repeating steps d3a) through d3c) until all epochs have been energy normalized.
  - 23. The method as claimed in claim 22, further comprising the steps of:
    - d3e) aligning an energy normalized epoch with a second epoch of the sequence of interpolated segments; and
      
      d3f) repeating step d3e) until all energy normalized epochs have been aligned, producing the phase error waveform.
  - 24. The method as claimed in claim 23, wherein step d3e) comprises the steps of:
    - d3e1) correlating the energy normalized epoch with the second epoch, producing a maximum correlation offset index; and
      
      d3e2) cyclically shifting the energy normalized epoch by the maximum correlation offset index, producing a shifted normalized epoch.
  - 25. The method as claimed in claim 21, further comprising the step, performed before step d1b), of:
    - d1e) pitch normalizing the target segment and the source segment.
  - 26. The method as claimed in claim 20, wherein step d2) comprises the step of:
    - d2a) pitch normalizing the target segment to a uniform normalizing length by upsampling the target segment when a target segment length is less than the uniform normalizing length and downsampling the target segment when the target segment length is more than the uniform normalizing length.
  - 27. The method as claimed in claim 1, wherein the orthogonal error waveforms comprise an ensemble error waveform, a standard deviation error waveform, a mean error waveform, and a phase error waveform, and step d) comprises the steps of:
    - d1) computing a periodic excitation model of the excitation waveform by ensemble interpolating between the target segment which has been energy normalized and aligned and the source segment;
      
      d2) creating a second excitation waveform by pitch normalizing and energy normalizing the excitation waveform;
      
      d3) phase normalizing the second excitation waveform, resulting in a third excitation waveform, which will be used as a reference excitation waveform, and a phase error waveform;
      
      d4) computing the ensemble error waveform by computing errors between the reference excitation waveform and the periodic excitation model;
      
      d5) creating a standard deviation reference waveform from the excitation waveform;
      
      d6) computing a standard deviation model from standard deviation values derived from the target segment and the source segment;
      
      d7) computing the standard deviation error waveform by computing errors between the standard deviation reference waveform and the standard deviation model;
      
      d8) creating a mean reference waveform from the excitation waveform;
      
      d9) computing a mean model from mean values derived from the target segment and the source segment; and
      
      d10) computing the mean error waveform by computing errors between the mean reference waveform and the mean model.
  - 28. The method as claimed in claim 1, wherein step e) comprises the step of:
    - e1) encoding the orthogonal error waveforms and the parameters using one or more trellis-coded quantizers.
  - 29. The method as claimed in claim 1, wherein step e) comprises the step of:
    - e1) encoding the orthogonal error waveforms and parameters using one or more multi-stage vector quantizers, wherein a bitrate can be decreased by decreasing a number of stages employed by the one or more multi-stage vector quantizers, and the bitrate can be increased by increasing the number of stages employed by the one or more multi-stage vector quantizers.
  - 30. The method as claimed in claim 1, wherein step e) comprises the step of:
    - e1) encoding a subset of the orthogonal error waveforms in order to decrease a bitrate to a desired bitrate, wherein a number of the orthogonal error waveforms in the subset depends on the desired bitrate.
  - 31. The method as claimed in claim 30, wherein step e1) comprises the step of:
    - e1a) selecting particular error waveforms of the orthogonal error waveforms for the subset based on a hierarchy.
  - 32. The method as claimed in claim 1, wherein step e) comprises the step of:
    - e1) encoding the orthogonal error waveforms and parameters using one or more vector quantizers, wherein a bitrate can be decreased by decreasing a size of a codebook used by the one or more vector quantizers, and the bitrate can be increased by increasing the size of the codebook used by the one or more vector quantizers.
  - 33. The method as claimed in claim 1, wherein a first error waveform of the orthogonal error waveforms is an ensemble error waveform and step e) comprises the steps of:
    - a) characterizing the ensemble error waveform by filtering the ensemble error waveform, resulting in a filtered error waveform;
      
      b) transforming the filtered error waveform into a frequency domain, resulting in an inphase waveform and a quadrature waveform;
      
      c) selecting a codebook subset based on a degree-of-periodicity of the excitation waveform;
      
      d) encoding a subset of samples of the inphase waveform using the codebook subset; and
      
      e) encoding a subset of samples of the quadrature waveform using the codebook subset.
  - 34. The method as claimed in claim 33, further comprising the steps of:
    - f) determining whether a spectral model is to be used;
      
      g) if the spectral model is to be used, performing a linear prediction coding (LPC) analysis on the filtered error waveform;
      
      h) quantizing spectral parameters associated with the spectral model; and
      
      i) using a quantized version of the spectral parameters to inverse filter the filtered error waveform, resulting in a spectral error model excitation waveform which is used as the filtered error waveform in step b).
  - 35. The method as claimed in claim 1, wherein a second error waveform of the orthogonal error waveforms is a standard deviation error waveform and step e) comprises the steps of:
    - e1) determining whether more than one segment exists in the excitation waveform;
      
      e2) when more than one segment exists, upsampling the standard deviation error waveform to a common vector length;
      
      e3) selecting a first codebook subset based on a degree-of-periodicity of the excitation waveform; and
      
      e4) encoding the standard deviation error waveform using the first codebook subset, resulting in a characterized, encoded standard deviation error waveform.
  - 36. The method as claimed in claim 1, wherein a third error waveform of the orthogonal error waveforms is a mean error waveform and step e) comprises the steps of:
    - e1) determining whether more than one segment exists in the excitation waveform;
      
      e2) when more than one segment exists, upsampling the mean error waveform to a common vector length;
      
      e3) selecting a first codebook subset based on a degree-of-periodicity of the excitation waveform; and
      
      e4) encoding the mean error waveform using the first codebook subset, resulting in an characterized, encoded mean error waveform.
  - 37. The method as claimed in claim 1, wherein a fourth error waveform of the orthogonal error waveforms is a phase error waveform and step e) comprises the steps of:
    - e1) determining whether more than one segment exists in the excitation waveform;
      
      e2) when more than one segment exists, upsampling the phase error waveform to a common vector length;
      
      e3) selecting a first codebook subset based on a degree-of-periodicity of the excitation waveform; and
      
      e4) encoding the phase error waveform using the first codebook subset, resulting in a characterized, encoded phase error waveform.
  - 38. The method as claimed in claim 1, wherein step a) comprises the step of:
    - a1) epoch-aligning the number of samples of input speech which includes multiple epochs, resulting in epoch-aligned segment corresponding to one or more excitation epoch locations; and
      
      a2) performing the LPC analysis on the epoch-aligned segment.
  - 39. The method as claimed in claim 38, wherein step a) further comprises the steps of:
    - a1) low-pass filtering a segment of speech samples, resulting in filtered speech samples;
      
      a2) determining a waveform sense for each of the filtered speech samples, the speech samples, and a first excitation waveform;
      
      a3) applying the waveform sense to each of the filtered speech samples, the speech samples, and the first excitation waveform;
      
      a4) rectifying the filtered speech samples, the speech samples, and the first excitation waveform;
      
      a5) setting deviation factors for each of the filtered speech samples, the speech samples, and the first excitation waveform;
      
      a6) searching the filtered speech samples for first peaks at pitch intervals including a first deviation factor, resulting in filtered speech peak locations;
      
      a7) searching the speech samples for second peaks including a second deviation factor, resulting in speech peak locations;
      
      a8) searching the first excitation waveform for third peaks including a third deviation factor, resulting in excitation peak locations; and
      
      a9) assigning offsets to each of the excitation peak locations, resulting in the one or more excitation epoch locations.
  - 40. The method as claimed in claim 1, wherein the step of computing orthogonal error waveforms comprises a step of estimating receiver epoch locations which comprises the steps of:
    - d1) loading a target index into a buffer;
      
      d2) loading a source index into the buffer;
      
      d3) estimating a pitch using the source index, the target index, and a number of epochs of the excitation waveform;
      
      d4) setting an index pointer to the source index;
      
      d5) incrementing the index pointer by the pitch, producing a subsequent index pointer;
      
      d6) rounding the subsequent index pointer to a nearest integer;
      
      d7) storing the subsequent index pointer; and
      
      d8) repeating steps d5) through d7) until all the receiver epoch locations have been estimated.
  - 41. The method as claimed in claim 1, wherein step e) comprises a step of encoding the target segment which comprises the steps of:
    - e1) downsampling the target segment when a size of the target segment exceeds a first number of samples;
      
      e2) energy normalizing the target segment;
      
      e3) performing a cyclic transform on the energy normalized target segment, resulting in a cyclically transformed segment;
      
      e4) performing a time-domain to frequency-domain transformation of the cyclically transformed segment, resulting in a frequency-domain representation;
      
      e5) selecting a codebook subset corresponding to a degree of periodicity of the excitation waveform;
      
      e6) encoding a subset of an inphase component of the frequency-domain representation; and
      
      e7) encoding a subset of a quadrature component of the frequency-domain representation.
  - 42. The method as claimed in claim 1, further comprising the step, performed before step d) of:
    - g) encoding a first set of parameters describing the input speech, wherein step d) computes the orthogonal error waveforms using the encoded first set of parameters.
  - 43. The method as claimed in claim 42, wherein step g) comprises a step of encoding the target segment which comprises the steps of:
    - g1) downsampling the target segment when a size of the target segment exceeds a first number of samples;
      
      g2) energy normalizing the target segment;
      
      g3) performing a cyclic transform on the energy normalized target segment, resulting in a cyclically transformed segment;
      
      g4) performing a time-domain to frequency-domain transformation of the cyclically transformed segment, resulting in a frequency-domain representation;
      
      g5) selecting a codebook subset corresponding to a degree of periodicity of the excitation waveform;
      
      g6) encoding a subset of an inphase component of the frequency-domain representation;
      
      g7) encoding a subset of a quadrature component of the frequency-domain representation;
      
      g8) computing a conjugate spectrum from the encoded inphase component and the encoded quadrature component, resulting in reconstructed inphase and quadrature vectors;
      
      g9) performing a frequency-domain to time-domain transformation on the reconstructed inphase and quadrature vectors, resulting in a cyclically shifted, energy normalized, quantized target;
      
      g10) performing an inverse cyclic transform on the a cyclically shifted, energy normalized, quantized target, resulting in a quantized target; and
      
      g11) upsampling the quantized target to an original target length when step g1) was previously performed.
  - 44. The method as claimed in claim 1, wherein the parameters comprise a degree of periodicity, the method further comprising the step of calculating the degree of periodicity which comprises the steps of:
    - g) computing at least one feature which conveys the degree of periodicity of the samples of input speech;
      
      h) loading multi-layer perceptron (MLP) weights into memory;
      
      i) computing an MLP output of a MLP classifier using the MLP weights and the at least one feature; and
      
      j) computing the degree of periodicity by scalar quantizing the MLP output.
  - 45. The method as claimed in claim 44, wherein the at least one feature comprises a subframe autocorrelation coefficient, a subframe LPC gain, a subframe energy ratio, and a subframe energy ratio to prior subframe energies.
  - 46. The method as claimed in claim 1, wherein the parameters comprise a pitch, the method further comprising the step of calculating the pitch which comprises the steps of:
    - g) bandpass filtering the samples of input speech;
      
      h) computing multiple subframe autocorrelations of the filtered samples of input speech;
      
      i) selecting maximum correlation subset from the multiple subframe autocorrelations;
      
      j) selecting an initial pitch estimate from the maximum correlation subset;
      
      k) searching for harmonic locations corresponding to the initial pitch estimate in the maximum correlation subset; and
      
      l) selecting a minimum harmonic location of the harmonic locations, the minimum harmonic location corresponding to the pitch.

47. A method for encoding speech comprising the steps of:
- a) computing at least one orthogonal model by extracting at least one excitation parameter from an excitation waveform, normalizing the excitation waveform by the at least one excitation parameter, and interpolating between elided parameters;
  
  b) computing at least one error waveform corresponding to each of the at least one orthogonal model and the elided parameters;
  
  c) encoding the at least one orthogonal model and the at least one error waveform; and
  
  d) creating a bitstream which includes encoded versions of the at least one orthogonal model and the at least one error waveform.

48. A method for encoding speech comprising the steps of:
- a) obtaining a number of samples of input speech;
  
  b) selecting a source segment of the input speech;
  
  c) computing a target segment as a portion of the input speech, wherein the target segment represents a fundamental period of the input speech;
  
  d) computing orthogonal error waveforms by computing at least one model, at least one model reference, and comparing the at least one model reference and the at least one model;
  
  e) encoding the orthogonal error waveforms and parameters describing the input speech; and
  
  f) creating a bitstream which includes encoded versions of the orthogonal error waveforms and the parameters.

49. A method for decoding a characterized, encoded standard deviation error waveform comprising the steps of:
- a) receiving an index representative of the characterized, encoded standard deviation error waveform;
  
  b) selecting a second codebook subset which corresponds to a first codebook subset which was used to encode a standard deviation error based on a degree-of-periodicity of an excitation waveform;
  
  c) decoding the characterized, encoded standard deviation error waveform using the second codebook subset, resulting in a characterized standard deviation error;
  
  d) determining whether more than one epoch exists; and
  
  e) if more than one epoch exists, downsampling the characterized standard deviation error to a number of samples equal to a number of epochs.

50. A method for decoding a characterized, encoded mean error waveform comprising the steps of:
- a) receiving an index representative of the characterized, encoded mean error waveform;
  
  b) selecting a second codebook subset which corresponds to a first codebook subset which was used to encode a mean error based on a degree-of-periodicity of an excitation waveform;
  
  c) decoding the characterized, encoded mean error waveform using the second codebook subset, resulting in a characterized mean error waveform;
  
  d) determining whether more than one epoch exists; and
  
  e) if more than one epoch exists, downsampling the characterized mean error waveform to a number of samples equal to a number of epochs.

51. A method for decoding a characterized, encoded phase error waveform comprising the steps of:
- a) receiving the characterized, encoded phase error waveform;
  
  b) selecting a second codebook subset which corresponds to a first codebook subset which was used to encode a phase error based on a degree-of-periodicity of an excitation waveform;
  
  c) decoding the characterized, encoded phase error waveform using the second codebook subset, resulting in a characterized phase error;
  
  d) determining whether more than one epoch exists; and
  
  e) if more than one epoch exists, downsampling the characterized phase error to a number of samples equal to a number of epochs.

52. A method for synthesizing a speech waveform from information contained within a bitstream, the method comprising the steps of:
- a) receiving the bitstream having a degree-of-periodicity indicator, an encoded inphase error vector, and an encoded quadrature error vector;
  
  b) selecting a codebook subset based on the degree-of-periodicity indicator;
  
  c) decoding the encoded inphase error vector and the encoded quadrature error vector using the codebook subset, resulting in a decoded inphase error vector and a decoded quadrature error vector;
  
  d) cyclically repeating the decoded inphase error vector and the decoded quadrature error vector, resulting in a repeating inphase error vector and a repeating quadrature error vector;
  
  e) computing a conjugate spectrum of the repeating quadrature error vector and the repeating inphase error vector; and
  
  f) performing a frequency-domain to time-domain transformation of the conjugate spectrum, resulting in an ensemble error waveform.
- View Dependent Claims (53, 54, 55)
- - 53. The method as claimed in claim 52, wherein step d) comprises the step of:
    - d1) cyclically repeating the decoded inphase error vector and the decoded quadrature error vector, including changing a sign of a repeated vector for each successive cycle.
  - 54. The method as claimed in claim 52, further comprising the step of:
    - g) applying a weighting function to the repeating inphase error vector and the repeating quadrature error vector.
  - 55. The method as claimed in claim 52, further comprising the step of:
    - g) adding scaled noise to the repeating inphase error vector and the repeating quadrature error vector.

56. A method for synthesizing speech comprising the steps of:
- a) decoding orthogonal error waveforms and parameters describing encoded speech;
  
  b) computing an excitation estimate from the decoded orthogonal error waveforms and the decoded parameters; and
  
  c) synthesizing speech from the excitation estimate and the decoded parameters.
- View Dependent Claims (57, 58, 59, 60, 61, 62)
- - 57. The method as claimed in claim 56, wherein step a) comprises the step of decoding a target segment which comprises the steps of:
    - a1) selecting a codebook subset corresponding to a degree of periodicity of the encoded speech;
      
      a2) decoding an inphase component of a frequency-domain representation of an encoded inphase component;
      
      a3) decoding a quadrature component of a frequency-domain representation of an encoded quadrature component;
      
      a4) computing a conjugate spectrum from the decoded inphase component and the decoded quadrature component, resulting in reconstructed inphase and quadrature vectors;
      
      a5) performing a frequency-domain to time-domain transformation on the reconstructed inphase and quadrature vectors, resulting in a cyclically shifted, energy normalized, quantized target;
      
      a6) performing an inverse cyclic transform on the a cyclically shift, energy normalized, quantized target, resulting in a quantized target; and
      
      a7) upsampling the quantized target to an original target length when the quantized target was downsampled during encoding.
  - 58. The method as claimed in claim 56, wherein step a) comprises a step of decoding an ensemble error which comprises the steps of:
    - a1) selecting a codebook subset corresponding to a degree of periodicity of the encoded speech;
      
      a2) decoding an inphase component of a frequency-domain representation of an encoded inphase component;
      
      a3) decoding a quadrature component of a frequency-domain representation of an encoded quadrature component;
      
      a4) performing modulo-F cyclic repetition on the decoded inphase component and the decoded quadrature component, resulting in a second inphase component and a second quadrature component;
      
      a5) computing a conjugate spectrum from the second inphase component and the second quadrature component, resulting in reconstructed inphase and quadrature vectors; and
      
      a6) performing a frequency-domain to time-domain transformation on the reconstructed inphase and quadrature vectors, resulting in the ensemble error.
  - 59. The method as claimed in claim 58, wherein step a4) comprises the steps of:
    - a4a) cyclically repeating the inphase component at a modulo-F interval, wherein F represents a characterization filter cutoff, resulting in contiguous successive inphase cycles;
      
      a4b) alternately changing signs of the contiguous successive inphase cycles;
      
      a4c) weighting the contiguous successive inphase cycles;
      
      a4d) cyclically repeating the quadrature component at the modulo-F interval, wherein F represents the characterization filter cutoff, resulting in contiguous successive quadrature cycles;
      
      a4e) alternately changing signs of the contiguous successive quadrature cycles; and
      
      a4f) weighting the contiguous successive quadrature cycles.
  - 60. The method as claimed in claim 59, further comprising the step of:
    - a4g) applying noise to the contiguous successive inphase cycles and the contiguous successive quadrature cycles.
  - 61. The method as claimed in claim 56, wherein step a) comprises a step of decoding an ensemble error which comprises the steps of:
    - a1) selecting a codebook subset corresponding to a degree of periodicity of the encoded speech;
      
      a2) decoding an inphase component of a frequency-domain representation of an encoded inphase component;
      
      a3) decoding a quadrature component of a frequency-domain representation of an encoded quadrature component;
      
      a4) computing a conjugate spectrum from the decoded inphase component and the decoded quadrature component, resulting in reconstructed inphase and quadrature vectors;
      
      a5) performing a frequency-domain to time-domain transformation on the reconstructed inphase and quadrature vectors, resulting in a spectral error model excitation waveform;
      
      a6) decoding spectral error model parameters;
      
      a7) performing a prediction filter which uses the spectral error model parameters and the spectral error model excitation waveform, resulting in the ensemble error;
      
      a8) performing a time-domain to frequency-domain transformation on the ensemble error, resulting in a second inphase component and a second quadrature component;
      
      a9) performing modulo-F cyclic repetition on the second inphase component and the second quadrature component, resulting in a third inphase component and a third quadrature component;
      
      a10) computing a second conjugate spectrum from the third inphase component and the third quadrature component; and
      
      a11) performing a frequency-domain to time-domain transformation on the third inphase component and the third quadrature component, resulting in the ensemble error.
  - 62. The method as claimed in claim 56, wherein step b) comprises the steps of:
    - b1) pitch normalizing a decoded target segment, resulting in a pitch normalized target;
      
      b2) correlating a source segment with the pitch normalized target, resulting in a cyclically shifted target;
      
      b3) ensemble interpolating between the source segment and the cyclically shifted target, resulting in intervening epochs corresponding to a number of epochs in an analysis segment minus one, resulting in an ensemble interpolated waveform;
      
      b4) phase shifting an ensemble error waveform so that the ensemble error waveform is aligned with the ensemble interpolated waveform;
      
      b5) applying the phase shifted ensemble error waveform to the ensemble interpolated waveform, resulting in a pitch normalized, energy normalized, shifted excitation waveform;
      
      b6) interpolating a standard deviation, resulting in a standard deviation model;
      
      b7) applying a standard deviation error to the standard deviation model, resulting in a second standard deviation model;
      
      b8) interpolating a mean, resulting in a mean model;
      
      b9) applying a mean error to the mean model, resulting in a second mean model;
      
      b10) phase shifting each epoch of the pitch normalized, energy normalized, shifted excitation waveform, resulting in a pitch normalized, energy normalized excitation waveform;
      
      b11) denormalizing a pitch of the pitch normalized, energy normalized excitation waveform, resulting in an energy normalized excitation waveform; and
      
      b12) energy denormalizing the energy normalized excitation waveform using the second standard deviation model and the second mean model, resulting in an excitation waveform.

63. A method for synthesizing speech comprising the steps of:
- a) decoding orthogonal error waveforms and parameters describing encoded speech; and
  
  b) computing a speech estimate from the decoded orthogonal error waveforms and the decoded parameters, resulting in synthesized speech.

64. A speech encoding apparatus comprising:
- means for generating an excitation waveform by performing a linear prediction coding (LPC) analysis on a number of samples of input speech and inverse filtering the samples of input speech;
  
  means for selecting a source segment of the excitation waveform, coupled to the means for generating the excitation waveform;
  
  means for computing a target segment as a representative portion of the excitation waveform, coupled to the means for selecting the source, wherein the target segment represents a fundamental period of the excitation waveform;
  
  means for computing orthogonal error waveforms, coupled to the means for computing the target, by computing at least one model, at least one model reference, and comparing the at least one model reference and the at least one model;
  
  means for encoding the orthogonal error waveforms and parameters describing the input speech, coupled to the means for computing the orthogonal error waveforms; and
  
  means for creating a bitstream which includes encoded versions of the orthogonal error waveforms and the parameters, coupled to the means for encoding.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Bergstrom, Chad Scott, Gifford, Carl Steven, Abousleman, Glen Patrick, Pattison, Richard James
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/651,172
Time in Patent Office

847 Days
Field of Search

395/2.23, 395/2.27, 395/2.28, 395/2.29, 395/2.31, 395/2.32, 395/2.33, 395/2.39, 704/214, 704/218, 704/219, 704/220, 704/222, 704/223, 704/224, 704/230, 704/262, 704/264
US Class Current

704/223
CPC Class Codes

G10L 19/125 Pitch excitation, e.g. pitc...

G10L 19/24 Variable rate codecs, e.g. ...

Method and apparatus for speech excitation waveform coding using multiple error waveforms

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

64 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for speech excitation waveform coding using multiple error waveforms

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

64 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links