Audio encoder and decoder using a frequency domain processor with full-band gap filling and a time domain processor

US 10,332,535 B2
Filed: 01/24/2017
Issued: 06/25/2019
Est. Priority Date: 07/28/2014
Status: Active Grant

First Claim

Patent Images

1. An audio encoder for encoding an audio signal to generate an encoded audio signal, comprising:

a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor comprises;

a time frequency converter for converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion;

an analyzer for analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzer is configured to determine a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions;

a spectral encoder for encoding the first spectral portions with the first spectral resolution and for encoding the second spectral portions with the second spectral resolution, wherein the spectral encoder comprises a parametric coder for calculating spectral envelope information comprising the second spectral resolution from the second spectral portions;

a second encoding processor for encoding a second different audio signal portion in the time domain, wherein the second encoding processor comprises;

a sampling rate converter for converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal;

a time domain low band encoder for time domain encoding the lower sampling rate representation; and

a time domain bandwidth extension encoder for parametrically encoding the high band of the audio signal;

a controller configured for analyzing the audio signal and for determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and

an encoded signal former for forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion,wherein the analyzer is configured to perform a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions, wherein the first encoding processor is configured to perform a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion, and wherein the first encoding processor is furthermore configured to perform a quantization and entropy coding operation of shaped spectral values of the first spectral portions, and wherein spectral values of the second spectral portions are set to zero,the audio encoder further comprising a cross-processor, wherein the cross-processor comprises;

a noise shaper for shaping quantized spectral values of the first spectral portions using LPC coefficients derived from the first audio signal portion;

a spectral decoder for decoding the spectrally shaped spectral portions of the first spectral portion with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation;

a frequency-time converter for converting the decoded spectral representation into the time domain to acquire a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different than a sampling rate of the audio signal, and a sampling rate associated with an output signal of the frequency-time converter is different from a sampling rate of an audio signal input into the time-frequency-converter,wherein at least one of the first encoding processor, the time frequency converter, the analyzer, the spectral encoder, the second encoding processor, the sampling rate converter, the time domain low band encoder, the time domain bandwidth extension encoder, the controller, the encoded signal former, the cross-processor, the noise shaper, the spectral decoder and the frequency-time converter is implemented, at least in part, by a hardware element of the audio encoder.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio encoder for encoding an audio signal has: a first encoding processor for encoding a first audio signal portion in a frequency domain, having: a time frequency converter for converting the first audio signal portion into a frequency domain representation; an analyzer for analyzing the frequency domain representation to determine first spectral portions to be encoded with a first spectral resolution and second regions to be encoded with a second resolution; and a spectral encoder for encoding the first spectral portions with the first spectral resolution and encoding the second portions with the second resolution; a second encoding processor for encoding a second different audio signal portion in the time domain; a controller for analyzing and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion is the second audio signal portion encoded in the time domain; and an encoded signal former for forming an encoded audio signal having a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second portion.

Citations

23 Claims

1. An audio encoder for encoding an audio signal to generate an encoded audio signal, comprising:
- a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor comprises;
  
  a time frequency converter for converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion;
  
  an analyzer for analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzer is configured to determine a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions;
  
  a spectral encoder for encoding the first spectral portions with the first spectral resolution and for encoding the second spectral portions with the second spectral resolution, wherein the spectral encoder comprises a parametric coder for calculating spectral envelope information comprising the second spectral resolution from the second spectral portions;
  
  a second encoding processor for encoding a second different audio signal portion in the time domain, wherein the second encoding processor comprises;
  
  a sampling rate converter for converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal;
  
  a time domain low band encoder for time domain encoding the lower sampling rate representation; and
  
  a time domain bandwidth extension encoder for parametrically encoding the high band of the audio signal;
  
  a controller configured for analyzing the audio signal and for determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and
  
  an encoded signal former for forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion,wherein the analyzer is configured to perform a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions, wherein the first encoding processor is configured to perform a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion, and wherein the first encoding processor is furthermore configured to perform a quantization and entropy coding operation of shaped spectral values of the first spectral portions, and wherein spectral values of the second spectral portions are set to zero,the audio encoder further comprising a cross-processor, wherein the cross-processor comprises;
  
  a noise shaper for shaping quantized spectral values of the first spectral portions using LPC coefficients derived from the first audio signal portion;
  
  a spectral decoder for decoding the spectrally shaped spectral portions of the first spectral portion with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation;
  
  a frequency-time converter for converting the decoded spectral representation into the time domain to acquire a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different than a sampling rate of the audio signal, and a sampling rate associated with an output signal of the frequency-time converter is different from a sampling rate of an audio signal input into the time-frequency-converter,wherein at least one of the first encoding processor, the time frequency converter, the analyzer, the spectral encoder, the second encoding processor, the sampling rate converter, the time domain low band encoder, the time domain bandwidth extension encoder, the controller, the encoded signal former, the cross-processor, the noise shaper, the spectral decoder and the frequency-time converter is implemented, at least in part, by a hardware element of the audio encoder.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The audio encoder of claim 1, further comprising:
    - a preprocessor configured for preprocessing the first audio signal portion and the second audio signal portion,wherein the preprocessor comprises;
      
      a prediction analyzer for determining prediction coefficients; and
      
      wherein the second encoding processor comprises;
      
      a prediction coefficient quantizer for generating a quantized version of the prediction coefficients; and
      
      an entropy coder for generating an encoded version of the quantized prediction coefficients,wherein the encoded signal former is configured for introducing the encoded version into the encoded audio signal.
  - 3. The audio encoder of claim 1,wherein a preprocessor comprises a resampler for resampling the audio signal to a sampling rate of the second encoding processor;
    - andwherein a prediction analyzer is configured to determine the prediction coefficients using a resampled audio signal, orwherein the preprocessor further comprises a long term prediction analysis stage for determining one or more long term prediction parameters for the first audio signal portion.
  - 4. The audio encoder of claim 1, further comprising a cross-processor for calculating, from an encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, so that the second encoding processor is initialized to encode the second audio signal portion immediately following the first audio signal portion in time in the audio signal.
  - 5. The audio encoder of claim 4, wherein the cross-processor comprises:
    - a spectral decoder for calculating a decoded version of the first encoded signal portion;
      
      a delay stage for feeding a delayed version of the decoded version into a de-emphasis stage of the second encoding processor for initialization;
      
      a weighted prediction coefficient analysis filtering block for filtering and feeding a filter output into a codebook determinator of the second encoding processor for initialization;
      
      an analysis filtering stage for filtering the decoded version or a pre-emphasized version and for feeding a filter residual into an adaptive codebook determinator of the second encoding processor for initialization;
      
      ora pre-emphasis filter for filtering the decoded version and for feeding a delayed or pre-emphasized version to a synthesis filtering stage of the second encoding processor for initialization.
  - 6. The audio encoder of claim 1,wherein the analyzer is configured to perform a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions,wherein the first encoding processor is configured to perform a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion, and wherein the first encoding processor is furthermore configured to perform a quantization and entropy coding operation of shaped spectral values of the first spectral portions, andwherein spectral values of the second spectral portions are set to zero.
  - 7. The audio encoder of claim 1,wherein the second encoding processor comprises at least one block of the following group of blocks:
    - a prediction analysis filter;
      
      an adaptive codebook stage;
      
      an innovative codebook stage;
      
      an estimator for estimating an innovative codebook entry;
      
      an ACELP/gain coding stage;
      
      a prediction synthesis filtering stage;
      
      a de-emphasis stage; and
      
      a bass post-filter analysis stage.

8. An audio decoder for decoding an encoded audio signal to obtain a decoded audio signal, comprising:
- a first decoding processor for decoding a first encoded audio signal portion in a frequency domain, the first decoding processor comprising;
  
  a spectral decoder for decoding first spectral portions with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein the spectral decoder is configured to generate the first decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and
  
  a frequency-time converter for converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion;
  
  a second decoding processor for decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding processor comprises;
  
  a time domain low band decoder for decoding to obtain a low band time domain signal;
  
  an upsampler for upsampling the low band time domain signal to obtain an upsampled low band time domain signal;
  
  a time domain bandwidth extension decoder for synthesizing a high band of a time domain output signal; and
  
  a mixer for mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal;
  
  a combiner for combining the decoded first audio signal portion and the decoded second audio signal portion to acquire the decoded audio signal; and
  
  a cross-processor for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding processor, so that the second decoding processor is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal,wherein the cross-processor further comprises;
  
  an additional frequency-time converter operating at a lower sampling rate than the frequency-time converter of the first decoding processor to acquire a further decoded first signal portion in the time domain,wherein the signal output by the additional frequency-time converter operating at the lower sampling rate comprises a second sampling rate being lower than a first sampling rate associated with an output of the frequency-time converter of the first decoding processor,wherein the additional frequency-time converter operating at the lower sampling rate comprises;
  
  a selector for selecting a low portion of a spectrum input into the additional frequency-time converter operating at the lower sampling rate in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1;
  
  a transform processor comprising a transform length being smaller than a transform length of the frequency-time converter of the first decoding processor; and
  
  a synthesis windower using a window comprising a smaller number of coefficients compared to a window used by the frequency-time converter of the first decoding processor,wherein at least one of the first decoding processor, the spectral decoder, the frequency-time converter, the second decoding processor, the time domain low band decoder, the upsampler, the time domain bandwidth extension decoder, the mixer, the combiner, the cross-processor, the additional frequency-time converter, the selector, the transform processor, and the synthesis windower is implemented, at least in part, by a hardware element of the audio decoder.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The audio decoder of claim 8,wherein the upsampler comprises an analysis filterbank operating at a first time domain low band decoder sampling rate and a synthesis filterbank operating at a second output sampling rate being higher than the first time domain low band decoder sampling rate.
  - 10. The audio decoder of claim 8,wherein the time domain low band decoder comprises a decoder and a synthesis filter for filtering a residual signal using synthesis filter coefficients,wherein the time domain bandwidth extension decoder is configured to upsample the residual signal and to process an upsampled residual signal using a non-linear operation to acquire a high band residual signal, and to spectrally shape the high band residual signal to acquire the synthesized high band.
  - 11. The audio decoder of claim 8,wherein the first decoding processor comprises an adaptive long term prediction post-filter for post-filtering the decoded first audio signal portion, wherein the adaptive long term prediction post-filter is controlled by one or more long term prediction parameters comprised in the encoded audio signal.
  - 12. The audio decoder of claim 8, further comprising:
    - a cross-processor for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding processor, so that the second decoding processor is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal.
  - 13. The audio decoder of claim 8,wherein the second decoding processor comprises at least one block of the group of blocks comprising:
    - an ACELP for decoding gains and an innovative codebook;
      
      an adaptive codebook synthesis stage;
      
      an ACELP post-processor;
      
      a prediction synthesis filter; and
      
      a de-emphasis stage.

14. A method of encoding an audio signal to generate an encoded audio signal, comprising:
- first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises;
  
  converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion;
  
  analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions;
  
  encoding the first spectral portions with the first spectral resolution and f-e-r encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution;
  
  second encoding a second different audio signal portion in the time domain wherein the second encoding comprises;
  
  converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal;
  
  time domain encoding the lower sampling rate representation; and
  
  parametrically encoding the high band of the audio signal;
  
  analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and
  
  forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion,wherein the analyzing the frequency domain representation comprises performing a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions, wherein the first encoding comprises performing a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion and performing a quantization and entropy coding operation of shaped spectral values of the first spectral portions, and wherein spectral values of the second spectral portions are set to zero,wherein the method further comprises a cross-processing procedure, wherein the cross-processing procedure comprises;
  
  shaping quantized spectral values of the first spectral portions using LPC coefficients derived from the first audio signal portion;
  
  decoding the spectrally shaped spectral portions of the first spectral portion with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation;
  
  converting the decoded spectral representation into the time domain to acquire a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different than a sampling rate of the audio signal, and a sampling rate associated with an output signal of the converting the decoded spectral representation is different from a sampling rate of an audio signal input into the converting,wherein one or more of the first encoding, the converting the first audio signal portion, the analyzing, the encoding the first spectral portions, the second encoding, the converting the second audio signal portion, the time domain encoding, the parametrically encoding, the analyzing the audio signal and the determining, the cross-processing procedure, the shaping, the decoding the spectrally shaped spectral portions, the synthesizing, the converting the decoded spectral representation, and the forming is implemented, at least in part, by one or more hardware elements of an audio signal processing device.

15. A method of decoding an encoded audio signal to obtain a decoded audio signal, comprising:
- first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising;
  
  decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and
  
  converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion;
  
  second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises;
  
  decoding to obtain a low band time domain signal;
  
  upsampling the low band time domain signal to obtain an upsampled low band time domain signal;
  
  synthesizing a high band of a time domain output signal; and
  
  mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal;
  
  combining the decoded audio signal portion and the decoded second spectral portion to acquire the decoded audio signal; and
  
  a cross-processing procedure for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding, so that the second decoding is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal,wherein the cross-processing procedure comprises;
  
  performing an additional frequency-time conversion operating at a lower sampling rate than the converting of the first decoding to acquire a further decoded first signal portion in the time domain,wherein the signal output by the additional frequency-time conversion operating at the lower sampling rate comprises a second sampling rate being lower than a first sampling rate associated with an output of the converting of the first decoding,wherein the additional frequency-time conversion operating at the lower sampling rate comprises;
  
  selecting a low portion of a spectrum input into the additional frequency-time conversion operating at the lower sampling rate in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1;
  
  performing a transform processing comprising a transform length being smaller than a transform length of the converting of the first decoding; and
  
  performing a synthesis windowing using a window comprising a smaller number of coefficients compared to a window used by the converting of the first decoding,wherein one or more of the first decoding, the decoding the first spectral portions with the high spectral resolution, the converting, the second decoding, the decoding to obtain the low band time domain signal, the upsampling, the synthesizing, the mixing, the combining, the cross-processing procedure, the performing an additional frequency-time conversion, the selecting, the performing a transform processing and the performing a synthesis windowing is implemented, at least in part, by one or more hardware elements of an audio signal processing device.

16. A non-transitory digital storage medium having stored thereon a computer program for performing, when running on a computer, a method of encoding an audio signal to generate an encoded signal, the method comprising:
- first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises;
  
  converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion;
  
  analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions;
  
  encoding the first spectral portions with the first spectral resolution and encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution;
  
  second encoding a second different audio signal portion in the time domain wherein the second encoding comprises;
  
  converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal;
  
  time domain encoding the lower sampling rate representation; and
  
  parametrically encoding the high band of the audio signal;
  
  analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and
  
  forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portionwherein the analyzing the frequency domain representation comprises performing a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions, wherein the first encoding comprises performing a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion and performing a quantization and entropy coding operation of shaped spectral values of the first spectral portions, and wherein spectral values of the second spectral portions are set to zero,wherein the method further comprises a cross-processing procedure, wherein the cross-processing procedure comprises;
  
  shaping quantized spectral values of the first spectral portions using LPC coefficients derived from the first audio signal portion;
  
  decoding the spectrally shaped spectral portions of the first spectral portion with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation;
  
  converting the decoded spectral representation into the time domain to acquire a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different than a sampling rate of the audio signal, and a sampling rate associated with an output signal of the converting the decoded spectral representation is different from a sampling rate of an audio signal input into the converting.

17. A non-transitory digital storage medium having stored thereon a computer program for performing, when running on a computer, a method of decoding an encoded audio signal to obtain a decoded audio signal, the method comprising:
- first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising;
  
  decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and
  
  converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion;
  
  second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises;
  
  decoding to obtain a low band time domain signal;
  
  upsampling the low band time domain signal to obtain an upsampled low band time domain signal;
  
  synthesizing a high band of a time domain output signal; and
  
  mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal;
  
  combining the decoded first audio signal portion and the decoded second audio signal portion to acquire the decoded audio signal; and
  
  a cross-processing procedure for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding, so that the second decoding is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal,wherein the cross-processing procedure comprises;
  
  performing an additional frequency-time conversion operating at a lower sampling rate than the converting of the first decoding to acquire a further decoded first signal portion in the time domain,wherein the signal output by the additional frequency-time conversion operating at the lower sampling rate comprises a second sampling rate being lower than a first sampling rate associated with an output of the converting of the first decoding,wherein the additional frequency-time conversion operating at the lower sampling rate comprises;
  
  selecting a low portion of a spectrum input into the additional frequency-time conversion operating at the lower sampling rate in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1;
  
  performing a transform processing comprising a transform length being smaller than a transform length of the converting of the first decoding; and
  
  performing a synthesis windowing using a window comprising a smaller number of coefficients compared to a window used by the converting of the first decoding.

18. An audio encoder for encoding an audio signal to generate an encoded audio signal, comprising:
- a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor comprises;
  
  a time frequency converter for converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion;
  
  an analyzer for analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzer is configured to determine a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions;
  
  a spectral encoder for encoding the first spectral portions with the first spectral resolution and for encoding the second spectral portions with the second spectral resolution, wherein the spectral encoder comprises a parametric coder for calculating spectral envelope information comprising the second spectral resolution from the second spectral portions;
  
  a second encoding processor for encoding a second different audio signal portion in the time domain, wherein the second encoding processor comprises;
  
  a sampling rate converter for converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal;
  
  a time domain low band encoder for time domain encoding the lower sampling rate representation; and
  
  a time domain bandwidth extension encoder for parametrically encoding the high band of the audio signal;
  
  a controller configured for analyzing the audio signal and for determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and
  
  an encoded signal former for forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion,wherein the second encoding processor comprises an associated second sampling rate, wherein the first encoding processor has associated therewith a first sampling rate being higher than the second sampling rate,wherein the audio encoder further comprises a cross-processor for calculating, from an encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, wherein the cross-processor comprisesa frequency-time converter for generating a time domain signal at the second sampling rate,wherein the frequency-time converter comprises;
  
  a selector for selecting a low portion of a spectrum input into the frequency time converter in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1,a transform processor comprising a transform length being smaller than a transform length of the time-frequency converter; and
  
  a synthesis windower for windowing using a window comprising a smaller number of window coefficients compared to a window used by the time frequency converter,wherein at least one of the first encoding processor, the time frequency converter, the analyzer, the spectral encoder, the second encoding processor, the sampling rate converter, the time domain low band encoder, the time domain bandwidth extension encoder, the controller, the encoded signal former, the cross-processor, the frequency-time converter, the selector, the transform processor, and the synthesis windower is implemented, at least in part, by a hardware element of the audio encoder.

19. A method of encoding an audio signal to generate an encoded audio signal, comprising:
- first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises;
  
  converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion;
  
  analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions;
  
  encoding the first spectral portions with the first spectral resolution and encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution;
  
  second encoding a second different audio signal portion in the time domain wherein the second encoding comprises;
  
  converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal;
  
  time domain encoding the lower sampling rate representation; and
  
  parametrically encoding the high band of the audio signal;
  
  analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and
  
  forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion,wherein the second encoding comprises an associated second sampling rate, wherein the first encoding has associated therewith a first sampling rate being higher than the second sampling rate,wherein the method further comprises a cross-processing procedure for calculating, from an encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, wherein the cross-processing procedure comprisesgenerating a time domain signal at the second sampling rate, the generating the time domain signal at the second sampling rate comprising;
  
  selecting a low portion of a spectrum input into the generating in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1,performing a transform comprising a transform length being smaller than a transform length used in converting the first audio signal portion; and
  
  windowing using a window comprising a smaller number of window coefficients compared to a window used by the converting the first audio signal portion,wherein one or more of the first encoding, the converting the first audio signal portion, the analyzing, the encoding the first spectral portions, the second encoding, the converting the second audio signal portion, the time domain encoding, the parametrically encoding, the analyzing the audio signal and the determining, the forming, the cross-processing procedure, the generating a time domain signal, the selecting, the performing a transform, and the windowing is implemented, at least in part, by one or more hardware elements of an audio signal processing device.

20. A non-transitory digital storage medium having stored thereon a computer program for performing, when running on a computer, a method of encoding an audio signal to generate an encoded audio signal, the method comprising:
- first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises;
  
  converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion;
  
  analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions;
  
  encoding the first spectral portions with the first spectral resolution and encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution;
  
  second encoding a second different audio signal portion in the time domain wherein the second encoding comprises;
  
  converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal;
  
  time domain encoding the lower sampling rate representation; and
  
  parametrically encoding the high band of the audio signal;
  
  analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and
  
  forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion,wherein the second encoding comprises an associated second sampling rate, wherein the first encoding has associated therewith a first sampling rate being higher than the second sampling rate,wherein the method further comprises a cross-processing procedure for calculating, from an encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, wherein the cross-processing procedure comprisesgenerating a time domain signal at the second sampling rate, the generating the time domain signal at the second sampling rate comprising;
  
  selecting a low portion of a spectrum input into the generating in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1,performing a transform comprising a transform length being smaller than a transform length used in converting the first audio signal portion; and
  
  windowing using a window comprising a smaller number of window coefficients compared to a window used by the converting the first audio signal portion.

21. An audio decoder for decoding an encoded audio signal to obtain a decoded audio signal, comprising:
- a first decoding processor for decoding a first encoded audio signal portion in a frequency domain, the first decoding processor comprising;
  
  a spectral decoder for decoding first spectral portions with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein the spectral decoder is configured to generate the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and
  
  a frequency-time converter for converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion;
  
  a second decoding processor for decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding processor comprises;
  
  a time domain low band decoder for decoding to obtain a low band time domain signal;
  
  an upsampler for upsampling the low band time domain signal to obtain an upsampled low band time domain signal;
  
  a time domain bandwidth extension decoder for synthesizing a high band of a time domain output signal; and
  
  a mixer for mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal;
  
  a combiner for combining the decoded first audio signal portion and the decoded second audio signal portion to acquire the decoded audio signal; and
  
  a cross-processor for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding processor, so that the second decoding processor is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal;
  
  wherein the cross-processor comprises;
  
  a delay stage for delaying a further decoded first signal portion and for feeding a delayed version of the further decoded first signal portion into a de-emphasis stage of the second decoding processor for initialization;
  
  a pre-emphasis filter and a delay stage for filtering and delaying the further decoded first signal portion and for feeding a delay stage output into a prediction synthesis filter of the second decoding processor for initialization;
  
  a prediction analysis filter for generating a prediction residual signal from the further decoded first spectral portion or a pre-emphasized further decoded first signal portion and for feeding the prediction residual signal into a codebook synthesizer of the second decoding processor;
  
  ora switch for feeding the further decoded first signal portion or an output of the de-emphasis stage of the second decoding processor into an analysis stage of a resampler of the second decoding processor for initialization,wherein at least one of the first decoding processor, the spectral decoder, the frequency-time converter, the second decoding processor, the time domain low band decoder, the upsampler, the time domain bandwidth extension decoder, the mixer, the combiner, the cross-processor, the delay stage, the pre-emphasis filter, the prediction analysis filter, and the switch is implemented, at least in part, by a hardware element of the audio decoder.

22. A method of decoding an encoded audio signal to obtain a decoded audio signal, comprising:
- first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising;
  
  decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and
  
  converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion;
  
  second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises;
  
  decoding to obtain a low band time domain signal;
  
  upsampling the low band time domain signal to obtain an upsampled low band time domain signal;
  
  synthesizing a high band of a time domain output signal; and
  
  mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal;
  
  combining the decoded audio signal portion and the decoded second spectral portion to acquire the decoded audio signal; and
  
  performing a cross-processing procedure for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding, so that the second decoding is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal;
  
  wherein the cross-processing procedure comprises;
  
  delaying a further decoded first signal portion and feeding a delayed version of the further decoded first signal portion into a de-emphasis stage of the second decoding for initialization;
  
  filtering and delaying the further decoded first signal portion and feeding a delayed output into a prediction synthesis filter of the second decoding for initialization;
  
  generating a prediction residual signal from the further decoded first spectral portion or a pre-emphasized further decoded first signal portion and feeding the prediction residual signal into a codebook synthesizer of the second decoding;
  
  orfeeding the further decoded first signal portion or an output of a de-emphasis stage of the second decoding into an analysis stage of a resampler of the second decoding for initialization,wherein one or more of the first decoding, the decoding the first spectral portions with the high spectral resolution, the converting, the second decoding, the decoding to obtain the low band time domain signal, the upsampling, the synthesizing, the mixing, the combining, the cross-processing procedure, the delaying a further decoded first signal portion and feeding a delayed version of the further decoded first signal portion, the filtering and delaying the further decoded first signal portion and feeding a delayed output, the generating a prediction residual signal from the further decoded first spectral portion or a pre-emphasized further decoded first signal portion and feeding the prediction residual signal, and the feeding the further decoded first signal portion or an output of a de-emphasis stage of the second decoding is implemented, at least in part, by one or more hardware elements of an audio signal processing device.

23. A non-transitory digital storage medium having stored thereon a computer program for performing, when running on a computer, a method of decoding an encoded audio signal to obtain a decoded audio signal, the method comprising:
- first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising;
  
  decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and
  
  converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion;
  
  second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises;
  
  decoding to obtain a low band time domain signal;
  
  upsampling the low band time domain signal to obtain an upsampled low band time domain signal;
  
  synthesizing a high band of a time domain output signal; and
  
  mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal;
  
  combining the decoded first audio signal portion and the decoded second audio signal portion to acquire the decoded audio signal andperforming a cross-processing procedure for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding, so that the second decoding is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal;
  
  wherein the cross-processing procedure comprises;
  
  delaying a further decoded first signal portion and feeding a delayed version of the further decoded first signal portion into a de-emphasis stage of the second decoding for initialization;
  
  filtering and delaying the further decoded first signal portion and feeding a delayed output into a prediction synthesis filter of the second decoding for initialization;
  
  generating a prediction residual signal from the further decoded first spectral portion or a pre-emphasized further decoded first signal portion and feeding the prediction residual signal into a codebook synthesizer of the second decoding;
  
  orfeeding the further decoded first signal portion or an output of a de-emphasis stage of the second decoding into an analysis stage of a resampler of the second decoding for initialization.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fraunhofer Gesellschaft Zur Foerderung Der Angewandten Forsching E.V.
Original Assignee
Fraunhofer Gesellschaft Zur Foerderung Der Angewandten Forsching E.V.
Inventors
Disch, Sascha, Dietz, Martin, Multrus, Markus, Fuchs, Guillaume, Ravelli, Emmanuel, Neusinger, Matthias, Schnell, Markus, Schubert, Benjamin, Grill, Bernhard
Primary Examiner(s)
Wozniak, James S

Application Number

US15/414,427
Publication Number

US 20170256267A1
Time in Patent Office

882 Days
Field of Search

704205, 704211, 704219, 704227, 704E19013
US Class Current
CPC Class Codes

G10L 19/02   using spectral analysis, e....

G10L 19/028   Noise substitution, i.e. su...

G10L 19/032   Quantisation or dequantisat...

G10L 19/04   using predictive techniques

G10L 19/06   Determination or coding of ...

G10L 19/18   Vocoders using multiple modes

G10L 19/20   using sound class specific ...

G10L 19/24   Variable rate codecs, e.g. ...

G10L 19/265   Pre-filtering, e.g. high fr...

G10L 21/038   using band spreading techni...

Audio encoder and decoder using a frequency domain processor with full-band gap filling and a time domain processor

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Audio encoder and decoder using a frequency domain processor with full-band gap filling and a time domain processor

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links