Method and apparatus for speech excitation waveform coding using multiple error waveforms
First Claim
1. A method for encoding speech comprising the steps of:
- a) generating an excitation waveform by performing a linear prediction coding (LPC) analysis on a number of samples of input speech and inverse filtering the samples of input speech;
b) selecting a source segment of the excitation waveform;
c) computing a target segment as a representative portion of the excitation waveform, wherein the target segment represents a fundamental period of the excitation waveform;
d) computing orthogonal error waveforms by computing at least one model, at least one model reference, and comparing the at least one model reference and the at least one model;
e) encoding the orthogonal error waveforms and parameters describing the input speech; and
f) creating a bitstream which includes encoded versions of the orthogonal error waveforms and the parameters.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus (100) for pitch-epoch-synchronous source-filter speech encoding by means of error component modeling methods (310) which capture fundamental orthogonal (uncorrelated) basis elements of an excitation source waveform. A periodic waveform model (318) along with four orthogonal error waveforms, desirably including phase error (319), ensemble error (321), standard deviation error (323), and mean error (324) waveforms, are incorporated together to form a complete description of the excitation. These error waveforms (319,321, 323, 324) represent those portions of the excitation that are not represented by the purely periodic model. By thus orthogonalizing the error components, the perceptual effect of each element is isolated from the composite set, and can thus be encoded separately. In addition to high-quality, fixed-rate operation, the identity-system capability and low complexity of the speech encoding method and apparatus make them applicable to variable-rate applications without changing underlying modeling methods.
-
Citations
64 Claims
-
1. A method for encoding speech comprising the steps of:
-
a) generating an excitation waveform by performing a linear prediction coding (LPC) analysis on a number of samples of input speech and inverse filtering the samples of input speech; b) selecting a source segment of the excitation waveform; c) computing a target segment as a representative portion of the excitation waveform, wherein the target segment represents a fundamental period of the excitation waveform; d) computing orthogonal error waveforms by computing at least one model, at least one model reference, and comparing the at least one model reference and the at least one model; e) encoding the orthogonal error waveforms and parameters describing the input speech; and f) creating a bitstream which includes encoded versions of the orthogonal error waveforms and the parameters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. A method for encoding speech comprising the steps of:
-
a) computing at least one orthogonal model by extracting at least one excitation parameter from an excitation waveform, normalizing the excitation waveform by the at least one excitation parameter, and interpolating between elided parameters; b) computing at least one error waveform corresponding to each of the at least one orthogonal model and the elided parameters; c) encoding the at least one orthogonal model and the at least one error waveform; and d) creating a bitstream which includes encoded versions of the at least one orthogonal model and the at least one error waveform.
-
-
48. A method for encoding speech comprising the steps of:
-
a) obtaining a number of samples of input speech; b) selecting a source segment of the input speech; c) computing a target segment as a portion of the input speech, wherein the target segment represents a fundamental period of the input speech; d) computing orthogonal error waveforms by computing at least one model, at least one model reference, and comparing the at least one model reference and the at least one model; e) encoding the orthogonal error waveforms and parameters describing the input speech; and f) creating a bitstream which includes encoded versions of the orthogonal error waveforms and the parameters.
-
-
49. A method for decoding a characterized, encoded standard deviation error waveform comprising the steps of:
-
a) receiving an index representative of the characterized, encoded standard deviation error waveform; b) selecting a second codebook subset which corresponds to a first codebook subset which was used to encode a standard deviation error based on a degree-of-periodicity of an excitation waveform; c) decoding the characterized, encoded standard deviation error waveform using the second codebook subset, resulting in a characterized standard deviation error; d) determining whether more than one epoch exists; and e) if more than one epoch exists, downsampling the characterized standard deviation error to a number of samples equal to a number of epochs.
-
-
50. A method for decoding a characterized, encoded mean error waveform comprising the steps of:
-
a) receiving an index representative of the characterized, encoded mean error waveform; b) selecting a second codebook subset which corresponds to a first codebook subset which was used to encode a mean error based on a degree-of-periodicity of an excitation waveform; c) decoding the characterized, encoded mean error waveform using the second codebook subset, resulting in a characterized mean error waveform; d) determining whether more than one epoch exists; and e) if more than one epoch exists, downsampling the characterized mean error waveform to a number of samples equal to a number of epochs.
-
-
51. A method for decoding a characterized, encoded phase error waveform comprising the steps of:
-
a) receiving the characterized, encoded phase error waveform; b) selecting a second codebook subset which corresponds to a first codebook subset which was used to encode a phase error based on a degree-of-periodicity of an excitation waveform; c) decoding the characterized, encoded phase error waveform using the second codebook subset, resulting in a characterized phase error; d) determining whether more than one epoch exists; and e) if more than one epoch exists, downsampling the characterized phase error to a number of samples equal to a number of epochs.
-
-
52. A method for synthesizing a speech waveform from information contained within a bitstream, the method comprising the steps of:
-
a) receiving the bitstream having a degree-of-periodicity indicator, an encoded inphase error vector, and an encoded quadrature error vector; b) selecting a codebook subset based on the degree-of-periodicity indicator; c) decoding the encoded inphase error vector and the encoded quadrature error vector using the codebook subset, resulting in a decoded inphase error vector and a decoded quadrature error vector; d) cyclically repeating the decoded inphase error vector and the decoded quadrature error vector, resulting in a repeating inphase error vector and a repeating quadrature error vector; e) computing a conjugate spectrum of the repeating quadrature error vector and the repeating inphase error vector; and f) performing a frequency-domain to time-domain transformation of the conjugate spectrum, resulting in an ensemble error waveform. - View Dependent Claims (53, 54, 55)
-
-
56. A method for synthesizing speech comprising the steps of:
-
a) decoding orthogonal error waveforms and parameters describing encoded speech; b) computing an excitation estimate from the decoded orthogonal error waveforms and the decoded parameters; and c) synthesizing speech from the excitation estimate and the decoded parameters. - View Dependent Claims (57, 58, 59, 60, 61, 62)
-
-
63. A method for synthesizing speech comprising the steps of:
-
a) decoding orthogonal error waveforms and parameters describing encoded speech; and b) computing a speech estimate from the decoded orthogonal error waveforms and the decoded parameters, resulting in synthesized speech.
-
-
64. A speech encoding apparatus comprising:
-
means for generating an excitation waveform by performing a linear prediction coding (LPC) analysis on a number of samples of input speech and inverse filtering the samples of input speech; means for selecting a source segment of the excitation waveform, coupled to the means for generating the excitation waveform; means for computing a target segment as a representative portion of the excitation waveform, coupled to the means for selecting the source, wherein the target segment represents a fundamental period of the excitation waveform; means for computing orthogonal error waveforms, coupled to the means for computing the target, by computing at least one model, at least one model reference, and comparing the at least one model reference and the at least one model; means for encoding the orthogonal error waveforms and parameters describing the input speech, coupled to the means for computing the orthogonal error waveforms; and means for creating a bitstream which includes encoded versions of the orthogonal error waveforms and the parameters, coupled to the means for encoding.
-
Specification