Method for coding speech and music signals

US 20030004711A1
Filed: 06/26/2001
Published: 01/02/2003
Est. Priority Date: 06/26/2001
Status: Active Grant

First Claim

Patent Images

1. A method for decoding a portion of a coded signal, the portion comprising a coded speech signal or a coded music signal, the method comprising the steps of:

determining whether the portion of the coded signal corresponds to a coded speech signal or to a coded music signal;

providing the portion of the coded signal to a speech excitation generator if it is determined that the portion of the coded signal corresponds to a coded speech signal, wherein an excitation signal is generated in keeping with a linear predictive procedure;

providing the portion of the coded signal to a transform excitation generator if it is determined that the portion of the coded signal corresponds to a coded music signal, wherein an excitation signal is generated in keeping with a transform coding procedure;

switching the input of a common linear predictive synthesis filter between the output of the speech excitation generator and the output of the transform excitation generator, whereby the common linear predictive synthesis filter provides as output a reconstructed signal corresponding to the input excitation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a transform coding method efficient for music signals that is suitable for use in a hybrid codec, whereby a common Linear Predictive (LP) synthesis filter is employed for both speech and music signals. The LP synthesis filter switches between a speech excitation generator and a transform excitation generator, in accordance with the coding of a speech or music signal, respectively. For coding speech signals, the conventional CELP technique may be used, while a novel asymmetrical overlap-add transform technique is applied for coding music signals. In performing the common LP synthesis filtering, interpolation of the LP coefficients is conducted for signals in overlap-add operation regions. The invention enables smooth transitions when the decoder switches between speech and music decoding modes.

Citations

15 Claims

1. A method for decoding a portion of a coded signal, the portion comprising a coded speech signal or a coded music signal, the method comprising the steps of:
- determining whether the portion of the coded signal corresponds to a coded speech signal or to a coded music signal;
  
  providing the portion of the coded signal to a speech excitation generator if it is determined that the portion of the coded signal corresponds to a coded speech signal, wherein an excitation signal is generated in keeping with a linear predictive procedure;
  
  providing the portion of the coded signal to a transform excitation generator if it is determined that the portion of the coded signal corresponds to a coded music signal, wherein an excitation signal is generated in keeping with a transform coding procedure;
  
  switching the input of a common linear predictive synthesis filter between the output of the speech excitation generator and the output of the transform excitation generator, whereby the common linear predictive synthesis filter provides as output a reconstructed signal corresponding to the input excitation.
- View Dependent Claims (2, 3, 4, 5, 7, 9)
- - 2. The method according to claim 1, wherein the coded music signal is formed according to an asymmetrical overlap-add transform method comprising the steps of:
    - receiving a music superframe consisting a sequence of input music signals;
      
      generating a residual signal and a plurality of linear predictive coefficients for the music superframe according to a linear predictive principle;
      
      applying an asymmetrical overlap-add window to the residual signal of the superframe to produce a windowed signal;
      
      performing a discrete cosine transformation on the windowed signal to obtain a set of discrete cosine transformation coefficients;
      
      calculating dynamic bit allocation information according to the input music signals or the linear predictive coefficients; and
      
      quantifying the discrete cosine transformation coefficients according to the dynamic bit allocation information.
  - 3. The method according to claim 1 wherein the portion of the coded signal comprises a signal superframe of a size optimized for transform coding.
  - 4. The method of claim 2, wherein the superframe is comprised of a series of elements, and wherein the step of applying an asymmetrical overlap-add window further comprises the steps of:
    - creating the asymmetrical overlap-add window by;
      
      modifying a first sub-series of elements of a present superframe in accordance with a last sub-series of elements of a previous superframe; and
      
      modifying a last sub-series of elements of the present superframe in accordance with a first sub-series of elements of a subsequent superframe; and
      
      multiplying the window by the present superframe in the time domain.
  - 5. The method of claim 4, further comprising the step of:
    - conducting an interpolation of a set of linear predictive coefficients.
  - 7. The computer readable medium according to claim 5, wherein the coded music signal is formed according to an asymmetrical overlap-add transform method comprising the steps of:
    - receiving a music superframe consisting a sequence of input music signals;
      
      generating a residual signal and a plurality of linear predictive coefficients for the music superframe according to a linear predictive principle;
      
      applying an asymmetrical overlap-add window to the residual signal of the superframe to produce a windowed signal;
      
      performing a discrete cosine transformation on the windowed signal to obtain a set of discrete cosine transformation coefficients;
      
      calculating dynamic bit allocation information according to the input music signals or the linear predictive coefficients; and
      
      quantifying the discrete cosine transformation coefficients according to the dynamic bit allocation information.
  - 9. The computer readable medium according to claim 7, wherein the superframe is comprised of a series of elements, and wherein the step of applying an asymmetrical overlap-add window further comprises the steps of:
    - creating the asymmetrical overlap-add window by;
      
      modifying a first sub-series of elements of a present superframe in accordance with a last sub-series of elements of a previous superframe; and
      
      modifying a last sub-series of elements of the present superframe in accordance with a first sub-series of elements of a subsequent superframe; and
      
      multiplying the window by the present superframe in the time domain.

6. A computer readable medium having instructions thereon for performing steps for decoding a portion of a coded signal, the portion comprising a coded speech signal or a coded music signal, the steps comprising:
- determining whether the portion of the coded signal corresponds to a coded speech signal or to a coded music signal;
  
  providing the portion of the coded signal to a speech excitation generator if it is determined that the portion of the coded signal corresponds to a coded speech signal, wherein an excitation signal is generated in keeping with a linear predictive procedure;
  
  providing the portion of the coded signal to a transform excitation generator if it is determined that the portion of the coded signal corresponds to a coded music signal, wherein an excitation signal is generated in keeping with a transform coding procedure;
  
  switching the input of a common linear predictive synthesis filter between the output of the speech excitation generator and the output of the transform excitation generator, whereby the common linear predictive synthesis filter provides as output a reconstructed signal corresponding to the input excitation.
- View Dependent Claims (8, 10)
- - 8. The computer readable medium according to claim 6, wherein the portion of the coded signal comprises a signal superframe of a size optimized for transform coding.
  - 10. The computer readable medium according to claim 8, further comprising instructions for causing the step of conducting an interpolation of a set of linear predictive coefficients.

11. An apparatus for coding a superframe signal, wherein the superframe signal comprises a sequence of speech signals or music signals, the apparatus comprising:
- a speech/music classifier for classifying the superframe as being a speech superframe or music superframe;
  
  a speech/music encoder for encoding the speech or music superframe and providing a plurality of encoded signals, wherein the speech/music encoder comprises a music encoder employing a transform coding method to produce an excitation signal for reconstructing the music superframe using a linear predictive synthesis filter; and
  
  a speech/music decoder for decoding the encoded signals, comprising;
  
  a transform decoder that performs an inverse of the transform coding method for decoding the encoded music signals; and
  
  a linear predictive synthesis filter for generating a reconstructed signal according to a set of linear predictive coefficients, wherein the filter is usable for the reproduction of both of music and speech signals.
- View Dependent Claims (12, 13, 15)
- - 12. The apparatus of claim 11, wherein speech/music classifier provides a mode bit indicating whether the superframe is music or speech.
  - 13. The apparatus of claim 11, wherein the speech/music encoder further comprises a speech encoder for encoding a speech superframe, wherein the speech encoder operates in accordance with a linear predictive principle.
  - 15. The apparatus of claim 11, wherein the transform decoder further comprises:
    - a dynamic bit allocation module for providing bit allocation information;
      
      an inverse quantization module for transferring quantified discrete cosine transformation coefficients into a set of discrete cosine transformation coefficients;
      
      a discrete cosine inverse transformation for transforming the discrete cosine transformation coefficients into a time-domain signal;
      
      an asymmetrical overlap-add windowing module for windowing the time-domain signal and producing a windowed signal; and
      
      an overlap-add module for modifying the windowed signal based on the asymmetrical windows.

14. The apparatus of claim 111, wherein the music encoder further comprises:
- a linear predictive analysis module for analyzing the music superframe and generating a set of linear predictive coefficients;
  
  a linear predictive coefficients quantization module for quantifying the linear predictive coefficients;
  
  an inverse linear predictive filter for receiving the linear predictive coefficients and the music superframe and providing a residual signal;
  
  an asymmetrical overlap-add windowing module for windowing the residual signal and producing a windowed signal;
  
  a discrete cosine transformation module for transforming the windowed signal to a set of discrete cosine transformation coefficients;
  
  a dynamic bit allocation module for providing bit allocation information based on at least one of the input signal or the linear predictive coefficients; and
  
  a discrete cosine transformation coefficients quantization module for quantifying the discrete cosine transformation coefficients according to the bit allocation information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Koishida, Kazuhito, Cuperman, Vladimir, Gersho, Allen, Majidimehr, Amir H.

Granted Patent

US 6,658,383 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/223
CPC Class Codes

G10L 19/0212   using orthogonal transforma...

G10L 19/04   using predictive techniques

G10L 19/18   Vocoders using multiple modes

Method for coding speech and music signals

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method for coding speech and music signals

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links