Method for coding speech and music signals
First Claim
1. A method for decoding a portion of a coded signal, the portion comprising a coded speech signal or a coded music signal, the method comprising the steps of:
- determining whether the portion of the coded signal corresponds to a coded speech signal or to a coded music signal;
providing the portion of the coded signal to a speech excitation generator if it is determined that the portion of the coded signal corresponds to a coded speech signal, wherein an excitation signal is generated in keeping with a linear predictive procedure;
providing the portion of the coded signal to a transform excitation generator if it is determined that the portion of the coded signal corresponds to a coded music signal, wherein an excitation signal is generated in keeping with a transform coding procedure, wherein the coded music signal is formed according to an asymmetrical overlap-add transform method comprising the steps of;
receiving a music superframe consisting of a sequence of input music signals;
generating a residual signal and a plurality of linear predictive coefficients for the music superframe according to a linear predictive principle;
applying an asymmetrical overlap-add window to the residual signal of the superframe to produce a windowed signal;
performing a discrete cosine transformation on the windowed signal to obtain a set of discrete cosine transformation coefficients;
calculating dynamic bit allocation information according to the input music signals or the linear predictive coefficients; and
quantifying the discrete cosine transformation coefficients according to the dynamic bit allocation information; and
switching the input of a common linear predictive synthesis filter between the output of the speech excitation generator and the output of the transform excitation generator, whereby the common linear predictive synthesis filter provides as output a reconstructed signal corresponding to the input excitation.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a transform coding method efficient for music signals that is suitable for use in a hybrid codec, whereby a common Linear Predictive (LP) synthesis filter is employed for both speech and music signals. The LP synthesis filter switches between a speech excitation generator and a transform excitation generator, in accordance with the coding of a speech or music signal, respectively. For coding speech signals, the conventional CELP technique may be used, while a novel asymmetrical overlap-add transform technique is applied for coding music signals. In performing the common LP synthesis filtering, interpolation of the LP coefficients is conducted for signals in overlap-add operation regions. The invention enables smooth transitions when the decoder switches between speech and music decoding modes.
132 Citations
7 Claims
-
1. A method for decoding a portion of a coded signal, the portion comprising a coded speech signal or a coded music signal, the method comprising the steps of:
-
determining whether the portion of the coded signal corresponds to a coded speech signal or to a coded music signal;
providing the portion of the coded signal to a speech excitation generator if it is determined that the portion of the coded signal corresponds to a coded speech signal, wherein an excitation signal is generated in keeping with a linear predictive procedure;
providing the portion of the coded signal to a transform excitation generator if it is determined that the portion of the coded signal corresponds to a coded music signal, wherein an excitation signal is generated in keeping with a transform coding procedure, wherein the coded music signal is formed according to an asymmetrical overlap-add transform method comprising the steps of;
receiving a music superframe consisting of a sequence of input music signals;
generating a residual signal and a plurality of linear predictive coefficients for the music superframe according to a linear predictive principle;
applying an asymmetrical overlap-add window to the residual signal of the superframe to produce a windowed signal;
performing a discrete cosine transformation on the windowed signal to obtain a set of discrete cosine transformation coefficients;
calculating dynamic bit allocation information according to the input music signals or the linear predictive coefficients; and
quantifying the discrete cosine transformation coefficients according to the dynamic bit allocation information; and
switching the input of a common linear predictive synthesis filter between the output of the speech excitation generator and the output of the transform excitation generator, whereby the common linear predictive synthesis filter provides as output a reconstructed signal corresponding to the input excitation. - View Dependent Claims (2, 3)
creating the asymmetrical overlap-add window by;
modifying a first sub-series of elements of a present superframe in accordance with a last sub-series of elements of a previous superframe; and
modifying a last sub-series of elements of the present superframe in accordance with a first sub-series of elements of a subsequent superframe; and
multiplying the window by the present superframe in the time domain.
-
-
3. The method of claim 2, further comprising the step of:
conducting an interpolation of a set of linear predictive coefficients.
-
4. A computer readable medium having instructions thereon for performing steps for decoding a portion of a coded signal, the portion comprising a coded speech signal or a coded music signal, the steps comprising:
-
determining whether the portion of the coded signal corresponds to a coded speech signal or to a coded music signal;
providing the portion of the coded signal to a speech excitation generator if it is determined that the portion of the coded signal corresponds to a coded speech signal, wherein an excitation signal is generated in keeping with a linear predictive procedure;
providing the portion of the coded signal to a transform excitation generator if it is determined that the portion of the coded signal corresponds to a coded music signal, wherein an excitation signal is generated in keeping with a transform coding procedure, wherein the coded music signal is formed according to an asymmetrical overlap-add transform method comprising the steps of;
receiving a music superframe consisting of a sequence of input music signals;
generating a residual signal and a plurality of linear predictive coefficients for the music superframe according to a linear predictive principle;
applying an asymmetrical overlap-add window to the residual signal of the superframe to produce a windowed signal;
performing a discrete cosine transformation on the windowed signal to obtain a set of discrete cosine transformation coefficients;
calculating dynamic bit allocation information according to the input music signals or the linear predictive coefficients; and
quantifying the discrete cosine transformation coefficients according to the dynamic bit allocation information; and
switching the input of a common linear predictive synthesis filter between the output of the speech excitation generator and the output of the transform excitation generator, whereby the common linear predictive synthesis filter provides as output a reconstructed signal corresponding to the input excitation. - View Dependent Claims (5)
creating the asymmetrical overlap-add window by;
modifying a first sub-series of elements of a present superframe in accordance with a last sub-series of elements of a previous superframe; and
modifying a last sub-series of elements of the present superframe in accordance with a first sub-series of elements of a subsequent superframe; and
multiplying the window by the present superframe in the time domain.
-
-
6. An apparatus for processing a superframe signal, wherein the superframe signal comprises a sequence of speech signals or music signals, the apparatus comprising:
-
a speech/music classifier for classifying the superframe as being a speech superframe or music superframe;
a speech/music encoder for encoding the speech or music superframe and providing a plurality of encoded signals, wherein the speech/music encoder comprises a music encoder employing a transform coding method to produce an excitation signal for reconstructing the music superframe using a linear predictive synthesis filter, wherein the music encoder further comprises;
a linear predictive analysis module for analyzing the music superframe and generating a set of linear predictive coefficients;
a linear predictive coefficients quantization module for quantifying the linear predictive coefficients;
an inverse linear predictive filter for receiving the linear predictive coefficients and the music superframe and providing a residual signal;
an asymmetrical overlap-add windowing module for windowing the residual signal and producing a windowed signal;
a discrete cosine transformation module for transforming the windowed signal to a set of discrete cosine transformation coefficients;
a dynamic bit allocation module for providing bit allocation information based on at least one of the input signal or the linear predictive coefficients; and
a discrete cosine transformation coefficients quantization module for quantifying the discrete cosine transformation coefficients according to the bit allocation information; and
a speech/music decoder for decoding the encoded signals, comprising;
a transform decoder that performs an inverse of the transform coding method for decoding the encoded music signals; and
a linear predictive synthesis filter for generating a reconstructed signal according to a set of linear predictive coefficients, wherein the filter is usable for the reproduction of both of music and speech signals.
-
-
7. An apparatus for processing a superframe signal, wherein the superframe signal comprises a sequence of speech signals or music signals, the apparatus comprising:
-
a speech/music classifier for classifying the superframe as being a speech superframe or music superframe;
a speech/music encoder for encoding the speech or music superframe and providing a plurality of encoded signals, wherein the speech/music encoder comprises a music encoder employing a transform coding method to produce an excitation signal for reconstructing the music superframe using a linear predictive synthesis filter; and
a speech/music decoder for decoding the encoded signals, comprising;
a transform decoder that performs an inverse of the transform coding method for decoding the encoded music signals, wherein the transform decoder further comprises;
a dynamic bit allocation module for providing bit allocation information;
an inverse quantization model for transferring quantified discrete cosine transformation coefficients into a set of discrete cosine transformation coefficients;
a discrete cosine inverse transformation module for transforming the discrete cosine transformation coefficients into a time-domain signal;
an asymmetrical overlap-add windowing module for windowing the time-domain signal and producing a windowed signal; and
an overlap-add module for modifying the windowed signal based on the asymmetrical windows; and
a linear predictive synthesis filter for generating a reconstructed signal according to a set of linear predictive coefficients, wherein the filter is usable for the reproduction of both of music and speech signals.
-
Specification