MULTI-MODE AUDIO CODEC AND CELP CODING ADAPTED THEREFORE

US 20120253797A1
Filed: 04/18/2012
Published: 10/04/2012
Est. Priority Date: 10/20/2009
Status: Active Grant

First Claim

Patent Images

1. A multi-mode audio decoder for providing a decoded representation of audio content on the basis of an encoded bitstream, the multi-mode audio decoder configured todecode a global gain value per frame of the encoded bitstream, wherein a first subset of the frames being coded in a first coding mode and a second subset of the frames being coded in a second coding mode, with each frame of the second subset being composed of more than one sub-frames,decode, per sub-frame of at least a subset of the sub-frames of the second subset of frames, a corresponding bitstream element differentially to the global gain value of the respective frame, andcomplete decoding the bitstream using the global gain value and the corresponding bitstream element in decoding the sub-frames of the at least subset of the sub-frames of the second subset of frames and the global gain value in decoding the first subset of frames,wherein the multi-mode audio decoder is configured such that a change of the global gain value of the frames within the encoded bitstream results in an adjustment of an output level of the decoded representation of the audio content.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In an embodiment, bitstream elements of sub-frames are encoded differentially to a global gain value so that a change of the global gain value results in an adjustment of an output level of the decoded representation of the audio content. Concurrently, the differential coding saves bits. Even further, the differential coding enables the lowering of the burden of globally adjusting the gain of an encoded bitstream. In another embodiment, a global gain control across CELP coded frames and transform coded frames is achieved by co-controlling the gain of the codebook excitation of the CELP codec, along with a level of the transform or inverse transform of the transform coded frames. In another embodiment, the gain value determination in CELP coding is performed in the weighted domain of the excitation signal.

Citations

33 Claims

1. A multi-mode audio decoder for providing a decoded representation of audio content on the basis of an encoded bitstream, the multi-mode audio decoder configured todecode a global gain value per frame of the encoded bitstream, wherein a first subset of the frames being coded in a first coding mode and a second subset of the frames being coded in a second coding mode, with each frame of the second subset being composed of more than one sub-frames,decode, per sub-frame of at least a subset of the sub-frames of the second subset of frames, a corresponding bitstream element differentially to the global gain value of the respective frame, andcomplete decoding the bitstream using the global gain value and the corresponding bitstream element in decoding the sub-frames of the at least subset of the sub-frames of the second subset of frames and the global gain value in decoding the first subset of frames,wherein the multi-mode audio decoder is configured such that a change of the global gain value of the frames within the encoded bitstream results in an adjustment of an output level of the decoded representation of the audio content.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 19)
- - 2. The multi-mode audio decoder according to claim 1, wherein the first coding mode is a frequency domain coding mode, and the second coding mode is a linear prediction coding mode.
  - 3. The multi-mode audio decoder according to claim 2, wherein the multi-mode audio decoder is configured to, in completing the decoding of the encoded bitstream, decode the sub-frames of the at least subset of the sub-frames of the second subset of frames by using transformed excitation linear prediction decoding, and decode a disjoined subset of the sub-frames of the second subset of the frames by use of CELP.
  - 4. The multi-mode audio decoder according to claim 1, wherein the multi-mode audio decoder is configured to decode, per frame of the second subset of the frames, a further bitstream element revealing a decomposition of the respective frame into one or more sub-frames.
  - 5. The multi-mode audio decoder according to claim 1, wherein the frames of the second subset are of equal length, and the at least subset of the sub-frames of the second subset of frames exhibit a varying sample length selected from the group comprising 256, 512 and 1024 samples, and a disjoined subset of the sub-frames exhibit a sample length of 256 samples.
  - 6. The multi-mode audio decoder according to claim 1, wherein the multi-mode audio decoder is configured to decode the global gain value on fixed number of bits and the bitstream element on a variable number of bits, the number depending on a sample length of the respective sub-frame.
  - 7. The multi-mode audio decoder according to claim 1, wherein the multi-mode audio decoder is configured to decode the global gain value on fixed number of bits and to decode the bitstream element on fixed number of bits.
  - 19. AN SBR decoder comprising a core decoder for decoding core-coder portion of a bitstream to acquire a core band signal according to claim 1 or claim 8 or claim 14, the SBR decoder configured to decode envelope energies for a spectral band to be replicated, from an SBR portion of the bitstream, and scaling the envelope energies according to an energy of the core band signal.

8. A multi-mode audio decoder for providing a decoded representation of an audio content on the basis of an encoded bitstream, a first subset of frames of which is CELP coded and a second subset of frames of which is transform coded, the multi-mode audio decoder comprising:
- a CELP decoder configured to decode a current frame of the first subset, the CELP decoder comprising;
  
  an excitation generator configured to generate a current excitation of the current frame of the first subset by constructing an codebook excitation based on a past excitation and an codebook index of the current frame of the first subset within the encoded bitstream, and setting a gain of the codebook excitation based on a global gain value within the encoded bitstream; and
  
  a linear prediction synthesis filter configured to filter the current excitation based on linear prediction filter coefficients for the current frame of the first subset within the encoded bitstream;
  
  a transform decoder configured to decode a current frame of the second subset byconstructing spectral information for the current frame of the second subset from the encoded bitstream and performing a spectral-to-time-domain transformation onto the spectral information to acquire a time-domain signal such that a level of the time-domain signal depends on the global gain value.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The multi-mode audio decoder according to claim 8, wherein the excitation generator is configured to, in generating the current excitation of the current frame of the first subset,construct an adaptive codebook excitation based on a past excitation and an adaptive codebook index of the current frame of the first subset within the encoded bitstream;
    - construct an innovation codebook excitation based on an innovation codebook index for the current frame of the first subset within the encoded bitstream;
      
      set, as the gain of the codebook excitation, a gain of the innovation codebook excitation based on the global gain value within the encoded bitstream; and
      
      combine the adaptive codebook excitation and the innovation codebook excitation to achieve the current excitation of the current frame of the first subset.
  - 10. The multi-mode audio decoder according to claim 8, wherein the transform decoder is configured such that the spectral information relates to a current excitation of the current frame of the second subset, and the transform decoder is further configured to, in decoding the current frame of the second subset, spectrally form the current excitation of the current frame of the second subset according to a linear prediction synthesis filter transfer function defined by linear prediction filter coefficients for the current frame of the second subset within the encoded bitstream so that the performance of the spectral-to-time-domain transformation onto the spectral information results in the decoded representation of the audio content.
  - 11. The multi-mode audio decoder according to claim 10, wherein the transform decoder is configured to perform the spectral formation by converting the linear prediction filter coefficients into a linear prediction spectrum and weighting the spectral information of the current excitation with the linear prediction spectrum.
  - 12. The multi-mode audio decoder according to according to claim 8, wherein the transform decoder is configured to scale the spectral information with the global gain value.
  - 13. The multi-mode audio decoder according to claim 8, wherein the transform decoder is configured to construct the spectral information for the current frame of the second subset by use of spectral transform coefficients within the encoded bitstream, and scale factors within the encoded bitstream for scaling the spectral transform coefficients in a spectral granularity of scale factor bands, with scaling the scale factors based on the global gain value, so as to achieve the decoded representation of the audio content.

14. A CELP decoder comprising:
- an excitation generator configured to generate a current excitation for a current frame of a bitstream byconstructing an adaptive codebook excitation based on a past excitation and an adaptive codebook index for the current frame within the bitstream;
  
  constructing an innovation codebook excitation based on an innovation codebook index for the current frame within the bitstream;
  
  computing an estimate of an energy of the innovation codebook excitation spectrally weighted by a weighted linear prediction synthesis filter constructed from linear prediction filter coefficients within the bitstream;
  
  setting a gain of the innovation codebook excitation based on a ratio between a global gain value within the bitstream and the estimated energy; and
  
  combining the adaptive codebook excitation and the innovation codebook excitation to achieve the current excitation; and
  
  a linear prediction synthesis filter configured to filter the current excitation based on the linear prediction filter coefficients.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The CELP decoder according to claim 14, wherein the excitation generator is configured to, in constructing the adaptive codebook excitation, filter the past excitation with a filter depending on the adaptive codebook index.
  - 16. The CELP decoder according to claim 14, wherein the excitation generator is configured to construct the innovation codebook excitation such that the latter comprises a zero vector with a number of non-zero pulses, the number and positions of the non-zero pulses being indicated by the innovation codebook index.
  - 17. The CELP decoder according to according to claim 14, wherein the excitation generator is configured to, in computing the estimate of the energy of the innovation codebook excitation, filter the innovation codebook excitation with
  - 18. The CELP decoder according to claim 14, wherein the excitation generator is configured to, in combining the adaptive codebook excitation and the innovation codebook excitation, form a weighted sum of the adaptive codebook excitation weighted with a weighting factor depending on the adaptive codebook index, and the innovation codebook excitation weighted with the gain.

20. A multi-mode audio encoder configured to encode an audio content into an encoded bitstream with encoding a first subset of frames in a first coding mode and a second subset of frames in a second coding mode, wherein the second subset of frames is respectively composed of one or more sub-frames, wherein the multi-mode audio encoder is configured to determine and encode a global gain value per frame, and determine and encode, per sub-frames of at least a subset of the sub-frames of the second subset, a corresponding bitstream element differentially to the global gain value of the respective frame, wherein the multi-mode audio encoder is configured such that a change of the global gain value of the frames within the encoded bitstream results in an adjustment of an output level of a decoded representation of the audio content at the decoding side.

21. A multi-mode audio encoder for encoding an audio content into an encoded bitstream by CELP encoding a first subset of frames of the audio content and transform encoding a second subset of the frames, the multi-mode audio encoder comprising:
- a CELP encoder configured to encode a current frame of the first subset, the CELP encoder comprisinga linear prediction analyzer configured to generate linear prediction filter coefficients for the current frame of the first subset and encode same into the encoded bitstream; and
  
  an excitation generator configured to determine a current excitation of the current frame of the first subset, which, when filtered by a linear prediction synthesis filter based on the linear prediction filter coefficients within the encoded bitstream, recovers the current frame of the first subset, defined by a past excitation and a codebook index for the current frame of the first subset and encoding the codebook index into the encoded bitstream; and
  
  a transform encoder configured to encode a current frame of the second subset by performing a time-to-spectral-domain transformation onto a time-domain signal for the current frame of the second subset to acquire spectral information and encode the spectral information into the encoded bitstream,wherein the multi-mode audio encoder is configured to encode a global gain value into the encoded bitstream, the global gain value depending on an energy of a version of the audio content of the current frame of the first subset, filtered with the linear prediction analysis filter depending on the linear prediction coefficients, or an energy of the time-domain signal.

22. A CELP encoder comprisinga linear prediction analyzer configured to generate linear prediction filter coefficients for a current frame of an audio content and encode the linear prediction filter coefficients into a bitstream;
- an excitation generator configured to determine a current excitation of the current frame as a combination of an adaptive codebook excitation and an innovation codebook excitation, which, when filtered by a linear prediction synthesis filter based on the linear prediction filter coefficients, recovers the current frame, byconstructing the adaptive codebook excitation defined by a past excitation and an adaptive codebook index for the current frame and encoding the adaptive codebook index into the bitstream; and
  
  constructing the innovation codebook excitation defined by an innovation codebook index for the current frame and encoding the innovation codebook index into the bitstream; and
  
  an energy determiner configured to determine an energy of a version of the audio content of the current frame filtered a weighting filter, to acquire a global gain value and encoding the global gain value into the bitstream, the weighting filter construed from the linear prediction filter coefficients.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The CELP encoder according to claim 22, wherein the linear prediction analyzer is configured to determine the linear prediction filter coefficients by linear prediction analysis applied onto a windowed and, according to a predetermined pre-emphasis filter, pre-emphasized version of the audio content.
  - 24. The CELP encoder according to claim 22, wherein the excitation generator is configured to, in constructing the adaptive codebook excitation and the innovation codebook excitation, minimize a perceptual weighted distortion measure relative to the audio content.
  - 25. The CELP encoder according to claim 22, wherein the excitation generator is configured to, in constructing the adaptive codebook excitation and the innovation codebook excitation, minimize a perceptual weighted distortion measure relative to the audio content using a perceptual weighting filter
    W(z)=A(z/γ
    - ),wherein γ
      
      is a perceptual weighting factor and A(z) is 1/H(z), wherein H(z) is the linear prediction synthesis filter, and wherein the energy determiner is configured to use the perceptual weighting filter as a weighting filter.
  - 26. The CELP encoder according to claim 22, wherein the excitation generator is configured to perform an excitation update to acquire a past excitation of a next frame, byestimating an innovation codebook excitation energy estimate by filtering an innovation codebook vector defined by first information contained within the innovation codebook index with

27. A multi-mode audio decoding method for providing a decoded representation of audio content on the basis of an encoded bitstream, the method comprisingdecoding a global gain value per frame of the encoded bitstream, wherein a first subset of the frames being coded in a first coding mode and a second subset of the frames being coded in a second coding mode, with each frame of the second subset being composed of more than one sub-frames,decoding, per sub-frame of at least a subset of the sub-frames of the second subset of frames, a corresponding bitstream element differentially to the global gain value of the respective frame, andcompleting decoding the bitstream using the global gain value and the corresponding bitstream element in decoding the sub-frames of the at least subset of the sub-frames of the second subset of frames and the global gain value in decoding the first subset of frames,wherein the multi-mode audio decoding method is performed such that a change of the global gain value of the frames within the encoded bitstream results in an adjustment of an output level of the decoded representation of the audio content.
- View Dependent Claims (33)
- - 33. A computer program comprising a program code for performing, when running on a computer, a method according to claim 27 or claim 28 or claim 29 or claim 30 or claim 31 or claim 32.

28. A multi-mode audio decoding method for providing a decoded representation of an audio content on the basis of an encoded bitstream, a first subset of frames of which is CELP coded and a second subset of frames of which is transform coded, the method comprising:
- CELP decoding a current frame of the first subset, the CELP decoding comprising;
  
  generating a current excitation of the current frame of the first subset by constructing an codebook excitation based on a past excitation and an codebook index of the current frame of the first subset within the encoded bitstream, and setting a gain of the codebook excitation based on a global gain value within the encoded bitstream; and
  
  filtering the current excitation based on linear prediction filter coefficients for the current frame of the first subset within the encoded bitstream;
  
  transform decoding a current frame of the second subset byconstructing spectral information for the current frame of the second subset from the encoded bitstream and performing a spectral-to-time-domain transformation onto the spectral information to acquire a time-domain signal such that a level of the time-domain signal depends on the global gain value.

29. A CELP decoding method comprising:
- generating a current excitation for a current frame of a bitstream byconstructing an adaptive codebook excitation based on a past excitation and an adaptive codebook index for the current frame within the bitstream;
  
  constructing an innovation codebook excitation based on an innovation codebook index for the current frame within the bitstream;
  
  computing an estimate of an energy of the innovation codebook excitation spectrally weighted by a weighted linear prediction synthesis filter constructed from linear prediction filter coefficients within the bitstream;
  
  setting a gain of the innovation codebook excitation based on a ratio between a global gain value within the bitstream and the estimated energy; and
  
  combining the adaptive codebook excitation and the innovation codebook excitation to achieve the current excitation; and
  
  filtering the current excitation based on the linear prediction filter coefficients by a linear prediction synthesis filter.

30. A multi-mode audio encoding method comprising encoding an audio content into an encoded bitstream with encoding a first subset of frames in a first coding mode and a second subset of frames in a second coding mode, wherein the second subset of frames is respectively composed of one or more sub-frames, wherein the multi-mode audio encoding method further comprises determining and encoding a global gain value per frame, and determine and encode, per sub-frames of at least a subset of the sub-frames of the second subset, a corresponding bitstream element differentially to the global gain value of the respective frame, wherein the multi-mode audio encoding method is performed such that a change of the global gain value of the frames within the encoded bitstream results in an adjustment of an output level of a decoded representation of the audio content at the decoding side.

31. A multi-mode audio encoding method for encoding an audio content into an encoded bitstream by CELP encoding a first subset of frames of the audio content and transform encoding a second subset of the frames, the multi-mode audio encoding method comprising:
- encoding a current frame of the first subset, the CELP encoder comprisingperforming linear prediction analysis to generate linear prediction filter coefficients for the current frame of the first subset and encode same into the encoded bitstream; and
  
  determining a current excitation of the current frame of the first subset, which, when filtered by a linear prediction synthesis filter based on the linear prediction filter coefficients within the encoded bitstream, recovers the current frame of the first subset, defined by a past excitation and a codebook index for the current frame of the first subset and encoding the codebook index into the encoded bitstream; and
  
  encoding a current frame of the second subset by performing a time-to-spectral-domain transformation onto a time-domain signal for the current frame of the second subset to acquire spectral information and encode the spectral information into the encoded bitstream,wherein the multi-mode audio encoding method further comprises encoding a global gain value into the encoded bitstream, the global gain value depending on an energy of a version of the audio content of the current frame of the first subset, filtered with the linear prediction analysis filter depending on the linear prediction coefficients, or an energy of the time-domain signal.

32. A CELP encoding method comprisingperforming linear prediction analysis to generate linear prediction filter coefficients for a current frame of an audio content and encode the linear prediction filter coefficients into a bitstream;
- determining a current excitation of the current frame as a combination of an adaptive codebook excitation and an innovation codebook excitation, which, when filtered by a linear prediction synthesis filter based on the linear prediction filter coefficients, recovers the current frame, byconstructing the adaptive codebook excitation defined by a past excitation and an adaptive codebook index for the current frame and encoding the adaptive codebook index into the bitstream; and
  
  constructing the innovation codebook excitation defined by an innovation codebook index for the current frame and encoding the innovation codebook index into the bitstream; and
  
  determining an energy of a version of the audio content of the current frame filtered a weighting filter, to acquire a global gain value and encoding the global gain value into the bitstream, the weighting filter construed from the linear prediction filter coefficients.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fraunhofer Gesellschaft Zur Foerderung Der Angewandten Forsching E.V.
Original Assignee
Fraunhofer Gesellschaft Zur Foerderung Der Angewandten Forsching E.V.
Inventors
Geiger, Ralf, Fuchs, Guillaume, Multrus, Markus, Grill, Bernhard

Granted Patent

US 8,744,843 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/219
CPC Class Codes

G10L 19/03   Spectral prediction for pre...

G10L 19/04   using predictive techniques

G10L 19/083   the excitation function bei...

G10L 19/12   the excitation function bei...

G10L 19/20   using sound class specific ...

G10L 2019/0002   Codebook adaptations

MULTI-MODE AUDIO CODEC AND CELP CODING ADAPTED THEREFORE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

MULTI-MODE AUDIO CODEC AND CELP CODING ADAPTED THEREFORE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links