Method for coding speech and music signals

US 6,658,383 B2
Filed: 06/26/2001
Issued: 12/02/2003
Est. Priority Date: 06/26/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method for decoding a portion of a coded signal, the portion comprising a coded speech signal or a coded music signal, the method comprising the steps of:

determining whether the portion of the coded signal corresponds to a coded speech signal or to a coded music signal;

providing the portion of the coded signal to a speech excitation generator if it is determined that the portion of the coded signal corresponds to a coded speech signal, wherein an excitation signal is generated in keeping with a linear predictive procedure;

providing the portion of the coded signal to a transform excitation generator if it is determined that the portion of the coded signal corresponds to a coded music signal, wherein an excitation signal is generated in keeping with a transform coding procedure, wherein the coded music signal is formed according to an asymmetrical overlap-add transform method comprising the steps of;

receiving a music superframe consisting of a sequence of input music signals;

generating a residual signal and a plurality of linear predictive coefficients for the music superframe according to a linear predictive principle;

applying an asymmetrical overlap-add window to the residual signal of the superframe to produce a windowed signal;

performing a discrete cosine transformation on the windowed signal to obtain a set of discrete cosine transformation coefficients;

calculating dynamic bit allocation information according to the input music signals or the linear predictive coefficients; and

quantifying the discrete cosine transformation coefficients according to the dynamic bit allocation information; and

switching the input of a common linear predictive synthesis filter between the output of the speech excitation generator and the output of the transform excitation generator, whereby the common linear predictive synthesis filter provides as output a reconstructed signal corresponding to the input excitation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a transform coding method efficient for music signals that is suitable for use in a hybrid codec, whereby a common Linear Predictive (LP) synthesis filter is employed for both speech and music signals. The LP synthesis filter switches between a speech excitation generator and a transform excitation generator, in accordance with the coding of a speech or music signal, respectively. For coding speech signals, the conventional CELP technique may be used, while a novel asymmetrical overlap-add transform technique is applied for coding music signals. In performing the common LP synthesis filtering, interpolation of the LP coefficients is conducted for signals in overlap-add operation regions. The invention enables smooth transitions when the decoder switches between speech and music decoding modes.

132 Citations

7 Claims

1. A method for decoding a portion of a coded signal, the portion comprising a coded speech signal or a coded music signal, the method comprising the steps of:
- determining whether the portion of the coded signal corresponds to a coded speech signal or to a coded music signal;
  
  providing the portion of the coded signal to a speech excitation generator if it is determined that the portion of the coded signal corresponds to a coded speech signal, wherein an excitation signal is generated in keeping with a linear predictive procedure;
  
  providing the portion of the coded signal to a transform excitation generator if it is determined that the portion of the coded signal corresponds to a coded music signal, wherein an excitation signal is generated in keeping with a transform coding procedure, wherein the coded music signal is formed according to an asymmetrical overlap-add transform method comprising the steps of;
  
  receiving a music superframe consisting of a sequence of input music signals;
  
  generating a residual signal and a plurality of linear predictive coefficients for the music superframe according to a linear predictive principle;
  
  applying an asymmetrical overlap-add window to the residual signal of the superframe to produce a windowed signal;
  
  performing a discrete cosine transformation on the windowed signal to obtain a set of discrete cosine transformation coefficients;
  
  calculating dynamic bit allocation information according to the input music signals or the linear predictive coefficients; and
  
  quantifying the discrete cosine transformation coefficients according to the dynamic bit allocation information; and
  
  switching the input of a common linear predictive synthesis filter between the output of the speech excitation generator and the output of the transform excitation generator, whereby the common linear predictive synthesis filter provides as output a reconstructed signal corresponding to the input excitation.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein the superframe is comprised of a series of elements, and wherein the step of applying an asymmetrical overlap-add window further comprises the steps of:
3. The method of claim 2, further comprising the step of:
- conducting an interpolation of a set of linear predictive coefficients.

4. A computer readable medium having instructions thereon for performing steps for decoding a portion of a coded signal, the portion comprising a coded speech signal or a coded music signal, the steps comprising:
- determining whether the portion of the coded signal corresponds to a coded speech signal or to a coded music signal;
  
  providing the portion of the coded signal to a speech excitation generator if it is determined that the portion of the coded signal corresponds to a coded speech signal, wherein an excitation signal is generated in keeping with a linear predictive procedure;
  
  providing the portion of the coded signal to a transform excitation generator if it is determined that the portion of the coded signal corresponds to a coded music signal, wherein an excitation signal is generated in keeping with a transform coding procedure, wherein the coded music signal is formed according to an asymmetrical overlap-add transform method comprising the steps of;
  
  receiving a music superframe consisting of a sequence of input music signals;
  
  generating a residual signal and a plurality of linear predictive coefficients for the music superframe according to a linear predictive principle;
  
  applying an asymmetrical overlap-add window to the residual signal of the superframe to produce a windowed signal;
  
  performing a discrete cosine transformation on the windowed signal to obtain a set of discrete cosine transformation coefficients;
  
  calculating dynamic bit allocation information according to the input music signals or the linear predictive coefficients; and
  
  quantifying the discrete cosine transformation coefficients according to the dynamic bit allocation information; and
  
  switching the input of a common linear predictive synthesis filter between the output of the speech excitation generator and the output of the transform excitation generator, whereby the common linear predictive synthesis filter provides as output a reconstructed signal corresponding to the input excitation.
- View Dependent Claims (5)
- - 5. The computer readable medium according to claim 4, wherein the superframe is comprised of a series of elements, and wherein the step of applying an asymmetrical overlap-add window further comprises the steps of:

6. An apparatus for processing a superframe signal, wherein the superframe signal comprises a sequence of speech signals or music signals, the apparatus comprising:
- a speech/music classifier for classifying the superframe as being a speech superframe or music superframe;
  
  a speech/music encoder for encoding the speech or music superframe and providing a plurality of encoded signals, wherein the speech/music encoder comprises a music encoder employing a transform coding method to produce an excitation signal for reconstructing the music superframe using a linear predictive synthesis filter, wherein the music encoder further comprises;
  
  a linear predictive analysis module for analyzing the music superframe and generating a set of linear predictive coefficients;
  
  a linear predictive coefficients quantization module for quantifying the linear predictive coefficients;
  
  an inverse linear predictive filter for receiving the linear predictive coefficients and the music superframe and providing a residual signal;
  
  an asymmetrical overlap-add windowing module for windowing the residual signal and producing a windowed signal;
  
  a discrete cosine transformation module for transforming the windowed signal to a set of discrete cosine transformation coefficients;
  
  a dynamic bit allocation module for providing bit allocation information based on at least one of the input signal or the linear predictive coefficients; and
  
  a discrete cosine transformation coefficients quantization module for quantifying the discrete cosine transformation coefficients according to the bit allocation information; and
  
  a speech/music decoder for decoding the encoded signals, comprising;
  
  a transform decoder that performs an inverse of the transform coding method for decoding the encoded music signals; and
  
  a linear predictive synthesis filter for generating a reconstructed signal according to a set of linear predictive coefficients, wherein the filter is usable for the reproduction of both of music and speech signals.

7. An apparatus for processing a superframe signal, wherein the superframe signal comprises a sequence of speech signals or music signals, the apparatus comprising:
- a speech/music classifier for classifying the superframe as being a speech superframe or music superframe;
  
  a speech/music encoder for encoding the speech or music superframe and providing a plurality of encoded signals, wherein the speech/music encoder comprises a music encoder employing a transform coding method to produce an excitation signal for reconstructing the music superframe using a linear predictive synthesis filter; and
  
  a speech/music decoder for decoding the encoded signals, comprising;
  
  a transform decoder that performs an inverse of the transform coding method for decoding the encoded music signals, wherein the transform decoder further comprises;
  
  a dynamic bit allocation module for providing bit allocation information;
  
  an inverse quantization model for transferring quantified discrete cosine transformation coefficients into a set of discrete cosine transformation coefficients;
  
  a discrete cosine inverse transformation module for transforming the discrete cosine transformation coefficients into a time-domain signal;
  
  an asymmetrical overlap-add windowing module for windowing the time-domain signal and producing a windowed signal; and
  
  an overlap-add module for modifying the windowed signal based on the asymmetrical windows; and
  
  a linear predictive synthesis filter for generating a reconstructed signal according to a set of linear predictive coefficients, wherein the filter is usable for the reproduction of both of music and speech signals.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Koishida, Kazuhito, Gersho, Allen, Majidimehr, Amir H., Cuperman, Vladimir
Primary Examiner(s)
Banks-Harold, Marsha D.
Assistant Examiner(s)
HARPER, V PAUL

Application Number

US09/892,105
Publication Number

US 20030004711A1
Time in Patent Office

889 Days
Field of Search

704/278, 704/267, 704/262, 704/230, 704/229, 704/220, 704/211, 704/206, 704/201
US Class Current

704/229
CPC Class Codes

G10L 19/0212   using orthogonal transforma...

G10L 19/04   using predictive techniques

G10L 19/18   Vocoders using multiple modes

Method for coding speech and music signals

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

132 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Method for coding speech and music signals

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

132 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links