Audio encoder and decoder using a frequency domain processor with full-band gap filling and a time domain processor
First Claim
1. An audio encoder for encoding an audio signal to generate an encoded audio signal, comprising:
- a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor comprises;
a time frequency converter for converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion;
an analyzer for analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzer is configured to determine a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions;
a spectral encoder for encoding the first spectral portions with the first spectral resolution and for encoding the second spectral portions with the second spectral resolution, wherein the spectral encoder comprises a parametric coder for calculating spectral envelope information comprising the second spectral resolution from the second spectral portions;
a second encoding processor for encoding a second different audio signal portion in the time domain, wherein the second encoding processor comprises;
a sampling rate converter for converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal;
a time domain low band encoder for time domain encoding the lower sampling rate representation; and
a time domain bandwidth extension encoder for parametrically encoding the high band of the audio signal;
a controller configured for analyzing the audio signal and for determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and
an encoded signal former for forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion,wherein the analyzer is configured to perform a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions, wherein the first encoding processor is configured to perform a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion, and wherein the first encoding processor is furthermore configured to perform a quantization and entropy coding operation of shaped spectral values of the first spectral portions, and wherein spectral values of the second spectral portions are set to zero,the audio encoder further comprising a cross-processor, wherein the cross-processor comprises;
a noise shaper for shaping quantized spectral values of the first spectral portions using LPC coefficients derived from the first audio signal portion;
a spectral decoder for decoding the spectrally shaped spectral portions of the first spectral portion with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation;
a frequency-time converter for converting the decoded spectral representation into the time domain to acquire a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different than a sampling rate of the audio signal, and a sampling rate associated with an output signal of the frequency-time converter is different from a sampling rate of an audio signal input into the time-frequency-converter,wherein at least one of the first encoding processor, the time frequency converter, the analyzer, the spectral encoder, the second encoding processor, the sampling rate converter, the time domain low band encoder, the time domain bandwidth extension encoder, the controller, the encoded signal former, the cross-processor, the noise shaper, the spectral decoder and the frequency-time converter is implemented, at least in part, by a hardware element of the audio encoder.
2 Assignments
0 Petitions
Accused Products
Abstract
An audio encoder for encoding an audio signal has: a first encoding processor for encoding a first audio signal portion in a frequency domain, having: a time frequency converter for converting the first audio signal portion into a frequency domain representation; an analyzer for analyzing the frequency domain representation to determine first spectral portions to be encoded with a first spectral resolution and second regions to be encoded with a second resolution; and a spectral encoder for encoding the first spectral portions with the first spectral resolution and encoding the second portions with the second resolution; a second encoding processor for encoding a second different audio signal portion in the time domain; a controller for analyzing and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion is the second audio signal portion encoded in the time domain; and an encoded signal former for forming an encoded audio signal having a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second portion.
-
Citations
23 Claims
-
1. An audio encoder for encoding an audio signal to generate an encoded audio signal, comprising:
-
a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor comprises; a time frequency converter for converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion; an analyzer for analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzer is configured to determine a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions; a spectral encoder for encoding the first spectral portions with the first spectral resolution and for encoding the second spectral portions with the second spectral resolution, wherein the spectral encoder comprises a parametric coder for calculating spectral envelope information comprising the second spectral resolution from the second spectral portions; a second encoding processor for encoding a second different audio signal portion in the time domain, wherein the second encoding processor comprises; a sampling rate converter for converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal; a time domain low band encoder for time domain encoding the lower sampling rate representation; and a time domain bandwidth extension encoder for parametrically encoding the high band of the audio signal; a controller configured for analyzing the audio signal and for determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and an encoded signal former for forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion, wherein the analyzer is configured to perform a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions, wherein the first encoding processor is configured to perform a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion, and wherein the first encoding processor is furthermore configured to perform a quantization and entropy coding operation of shaped spectral values of the first spectral portions, and wherein spectral values of the second spectral portions are set to zero, the audio encoder further comprising a cross-processor, wherein the cross-processor comprises; a noise shaper for shaping quantized spectral values of the first spectral portions using LPC coefficients derived from the first audio signal portion; a spectral decoder for decoding the spectrally shaped spectral portions of the first spectral portion with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation; a frequency-time converter for converting the decoded spectral representation into the time domain to acquire a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different than a sampling rate of the audio signal, and a sampling rate associated with an output signal of the frequency-time converter is different from a sampling rate of an audio signal input into the time-frequency-converter, wherein at least one of the first encoding processor, the time frequency converter, the analyzer, the spectral encoder, the second encoding processor, the sampling rate converter, the time domain low band encoder, the time domain bandwidth extension encoder, the controller, the encoded signal former, the cross-processor, the noise shaper, the spectral decoder and the frequency-time converter is implemented, at least in part, by a hardware element of the audio encoder. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An audio decoder for decoding an encoded audio signal to obtain a decoded audio signal, comprising:
-
a first decoding processor for decoding a first encoded audio signal portion in a frequency domain, the first decoding processor comprising; a spectral decoder for decoding first spectral portions with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein the spectral decoder is configured to generate the first decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and a frequency-time converter for converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion; a second decoding processor for decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding processor comprises; a time domain low band decoder for decoding to obtain a low band time domain signal; an upsampler for upsampling the low band time domain signal to obtain an upsampled low band time domain signal; a time domain bandwidth extension decoder for synthesizing a high band of a time domain output signal; and a mixer for mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal; a combiner for combining the decoded first audio signal portion and the decoded second audio signal portion to acquire the decoded audio signal; and a cross-processor for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding processor, so that the second decoding processor is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal, wherein the cross-processor further comprises; an additional frequency-time converter operating at a lower sampling rate than the frequency-time converter of the first decoding processor to acquire a further decoded first signal portion in the time domain, wherein the signal output by the additional frequency-time converter operating at the lower sampling rate comprises a second sampling rate being lower than a first sampling rate associated with an output of the frequency-time converter of the first decoding processor, wherein the additional frequency-time converter operating at the lower sampling rate comprises; a selector for selecting a low portion of a spectrum input into the additional frequency-time converter operating at the lower sampling rate in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1; a transform processor comprising a transform length being smaller than a transform length of the frequency-time converter of the first decoding processor; and a synthesis windower using a window comprising a smaller number of coefficients compared to a window used by the frequency-time converter of the first decoding processor, wherein at least one of the first decoding processor, the spectral decoder, the frequency-time converter, the second decoding processor, the time domain low band decoder, the upsampler, the time domain bandwidth extension decoder, the mixer, the combiner, the cross-processor, the additional frequency-time converter, the selector, the transform processor, and the synthesis windower is implemented, at least in part, by a hardware element of the audio decoder. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A method of encoding an audio signal to generate an encoded audio signal, comprising:
-
first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises; converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion; analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions; encoding the first spectral portions with the first spectral resolution and f-e-r encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution; second encoding a second different audio signal portion in the time domain wherein the second encoding comprises; converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal; time domain encoding the lower sampling rate representation; and parametrically encoding the high band of the audio signal; analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion, wherein the analyzing the frequency domain representation comprises performing a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions, wherein the first encoding comprises performing a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion and performing a quantization and entropy coding operation of shaped spectral values of the first spectral portions, and wherein spectral values of the second spectral portions are set to zero, wherein the method further comprises a cross-processing procedure, wherein the cross-processing procedure comprises; shaping quantized spectral values of the first spectral portions using LPC coefficients derived from the first audio signal portion; decoding the spectrally shaped spectral portions of the first spectral portion with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation; converting the decoded spectral representation into the time domain to acquire a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different than a sampling rate of the audio signal, and a sampling rate associated with an output signal of the converting the decoded spectral representation is different from a sampling rate of an audio signal input into the converting, wherein one or more of the first encoding, the converting the first audio signal portion, the analyzing, the encoding the first spectral portions, the second encoding, the converting the second audio signal portion, the time domain encoding, the parametrically encoding, the analyzing the audio signal and the determining, the cross-processing procedure, the shaping, the decoding the spectrally shaped spectral portions, the synthesizing, the converting the decoded spectral representation, and the forming is implemented, at least in part, by one or more hardware elements of an audio signal processing device.
-
-
15. A method of decoding an encoded audio signal to obtain a decoded audio signal, comprising:
-
first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising; decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion; second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises; decoding to obtain a low band time domain signal; upsampling the low band time domain signal to obtain an upsampled low band time domain signal; synthesizing a high band of a time domain output signal; and mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal; combining the decoded audio signal portion and the decoded second spectral portion to acquire the decoded audio signal; and a cross-processing procedure for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding, so that the second decoding is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal, wherein the cross-processing procedure comprises; performing an additional frequency-time conversion operating at a lower sampling rate than the converting of the first decoding to acquire a further decoded first signal portion in the time domain, wherein the signal output by the additional frequency-time conversion operating at the lower sampling rate comprises a second sampling rate being lower than a first sampling rate associated with an output of the converting of the first decoding, wherein the additional frequency-time conversion operating at the lower sampling rate comprises; selecting a low portion of a spectrum input into the additional frequency-time conversion operating at the lower sampling rate in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1; performing a transform processing comprising a transform length being smaller than a transform length of the converting of the first decoding; and performing a synthesis windowing using a window comprising a smaller number of coefficients compared to a window used by the converting of the first decoding, wherein one or more of the first decoding, the decoding the first spectral portions with the high spectral resolution, the converting, the second decoding, the decoding to obtain the low band time domain signal, the upsampling, the synthesizing, the mixing, the combining, the cross-processing procedure, the performing an additional frequency-time conversion, the selecting, the performing a transform processing and the performing a synthesis windowing is implemented, at least in part, by one or more hardware elements of an audio signal processing device.
-
-
16. A non-transitory digital storage medium having stored thereon a computer program for performing, when running on a computer, a method of encoding an audio signal to generate an encoded signal, the method comprising:
-
first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises; converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion; analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions; encoding the first spectral portions with the first spectral resolution and encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution; second encoding a second different audio signal portion in the time domain wherein the second encoding comprises; converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal; time domain encoding the lower sampling rate representation; and parametrically encoding the high band of the audio signal; analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion wherein the analyzing the frequency domain representation comprises performing a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions, wherein the first encoding comprises performing a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion and performing a quantization and entropy coding operation of shaped spectral values of the first spectral portions, and wherein spectral values of the second spectral portions are set to zero, wherein the method further comprises a cross-processing procedure, wherein the cross-processing procedure comprises; shaping quantized spectral values of the first spectral portions using LPC coefficients derived from the first audio signal portion; decoding the spectrally shaped spectral portions of the first spectral portion with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation; converting the decoded spectral representation into the time domain to acquire a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different than a sampling rate of the audio signal, and a sampling rate associated with an output signal of the converting the decoded spectral representation is different from a sampling rate of an audio signal input into the converting.
-
-
17. A non-transitory digital storage medium having stored thereon a computer program for performing, when running on a computer, a method of decoding an encoded audio signal to obtain a decoded audio signal, the method comprising:
-
first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising; decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion; second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises; decoding to obtain a low band time domain signal; upsampling the low band time domain signal to obtain an upsampled low band time domain signal; synthesizing a high band of a time domain output signal; and mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal; combining the decoded first audio signal portion and the decoded second audio signal portion to acquire the decoded audio signal; and a cross-processing procedure for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding, so that the second decoding is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal, wherein the cross-processing procedure comprises; performing an additional frequency-time conversion operating at a lower sampling rate than the converting of the first decoding to acquire a further decoded first signal portion in the time domain, wherein the signal output by the additional frequency-time conversion operating at the lower sampling rate comprises a second sampling rate being lower than a first sampling rate associated with an output of the converting of the first decoding, wherein the additional frequency-time conversion operating at the lower sampling rate comprises; selecting a low portion of a spectrum input into the additional frequency-time conversion operating at the lower sampling rate in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1; performing a transform processing comprising a transform length being smaller than a transform length of the converting of the first decoding; and performing a synthesis windowing using a window comprising a smaller number of coefficients compared to a window used by the converting of the first decoding.
-
-
18. An audio encoder for encoding an audio signal to generate an encoded audio signal, comprising:
-
a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor comprises; a time frequency converter for converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion; an analyzer for analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzer is configured to determine a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions; a spectral encoder for encoding the first spectral portions with the first spectral resolution and for encoding the second spectral portions with the second spectral resolution, wherein the spectral encoder comprises a parametric coder for calculating spectral envelope information comprising the second spectral resolution from the second spectral portions; a second encoding processor for encoding a second different audio signal portion in the time domain, wherein the second encoding processor comprises; a sampling rate converter for converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal; a time domain low band encoder for time domain encoding the lower sampling rate representation; and a time domain bandwidth extension encoder for parametrically encoding the high band of the audio signal; a controller configured for analyzing the audio signal and for determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and an encoded signal former for forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion, wherein the second encoding processor comprises an associated second sampling rate, wherein the first encoding processor has associated therewith a first sampling rate being higher than the second sampling rate, wherein the audio encoder further comprises a cross-processor for calculating, from an encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, wherein the cross-processor comprises a frequency-time converter for generating a time domain signal at the second sampling rate, wherein the frequency-time converter comprises; a selector for selecting a low portion of a spectrum input into the frequency time converter in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1, a transform processor comprising a transform length being smaller than a transform length of the time-frequency converter; and a synthesis windower for windowing using a window comprising a smaller number of window coefficients compared to a window used by the time frequency converter, wherein at least one of the first encoding processor, the time frequency converter, the analyzer, the spectral encoder, the second encoding processor, the sampling rate converter, the time domain low band encoder, the time domain bandwidth extension encoder, the controller, the encoded signal former, the cross-processor, the frequency-time converter, the selector, the transform processor, and the synthesis windower is implemented, at least in part, by a hardware element of the audio encoder.
-
-
19. A method of encoding an audio signal to generate an encoded audio signal, comprising:
-
first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises; converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion; analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions; encoding the first spectral portions with the first spectral resolution and encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution; second encoding a second different audio signal portion in the time domain wherein the second encoding comprises; converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal; time domain encoding the lower sampling rate representation; and parametrically encoding the high band of the audio signal; analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion, wherein the second encoding comprises an associated second sampling rate, wherein the first encoding has associated therewith a first sampling rate being higher than the second sampling rate, wherein the method further comprises a cross-processing procedure for calculating, from an encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, wherein the cross-processing procedure comprises generating a time domain signal at the second sampling rate, the generating the time domain signal at the second sampling rate comprising; selecting a low portion of a spectrum input into the generating in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1, performing a transform comprising a transform length being smaller than a transform length used in converting the first audio signal portion; and windowing using a window comprising a smaller number of window coefficients compared to a window used by the converting the first audio signal portion, wherein one or more of the first encoding, the converting the first audio signal portion, the analyzing, the encoding the first spectral portions, the second encoding, the converting the second audio signal portion, the time domain encoding, the parametrically encoding, the analyzing the audio signal and the determining, the forming, the cross-processing procedure, the generating a time domain signal, the selecting, the performing a transform, and the windowing is implemented, at least in part, by one or more hardware elements of an audio signal processing device.
-
-
20. A non-transitory digital storage medium having stored thereon a computer program for performing, when running on a computer, a method of encoding an audio signal to generate an encoded audio signal, the method comprising:
-
first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises; converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion; analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions; encoding the first spectral portions with the first spectral resolution and encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution; second encoding a second different audio signal portion in the time domain wherein the second encoding comprises; converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise a high band of the audio signal; time domain encoding the lower sampling rate representation; and parametrically encoding the high band of the audio signal; analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and forming the encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion, wherein the second encoding comprises an associated second sampling rate, wherein the first encoding has associated therewith a first sampling rate being higher than the second sampling rate, wherein the method further comprises a cross-processing procedure for calculating, from an encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, wherein the cross-processing procedure comprises generating a time domain signal at the second sampling rate, the generating the time domain signal at the second sampling rate comprising; selecting a low portion of a spectrum input into the generating in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1, performing a transform comprising a transform length being smaller than a transform length used in converting the first audio signal portion; and windowing using a window comprising a smaller number of window coefficients compared to a window used by the converting the first audio signal portion.
-
-
21. An audio decoder for decoding an encoded audio signal to obtain a decoded audio signal, comprising:
-
a first decoding processor for decoding a first encoded audio signal portion in a frequency domain, the first decoding processor comprising; a spectral decoder for decoding first spectral portions with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein the spectral decoder is configured to generate the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and a frequency-time converter for converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion; a second decoding processor for decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding processor comprises; a time domain low band decoder for decoding to obtain a low band time domain signal; an upsampler for upsampling the low band time domain signal to obtain an upsampled low band time domain signal; a time domain bandwidth extension decoder for synthesizing a high band of a time domain output signal; and a mixer for mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal; a combiner for combining the decoded first audio signal portion and the decoded second audio signal portion to acquire the decoded audio signal; and a cross-processor for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding processor, so that the second decoding processor is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal; wherein the cross-processor comprises; a delay stage for delaying a further decoded first signal portion and for feeding a delayed version of the further decoded first signal portion into a de-emphasis stage of the second decoding processor for initialization; a pre-emphasis filter and a delay stage for filtering and delaying the further decoded first signal portion and for feeding a delay stage output into a prediction synthesis filter of the second decoding processor for initialization; a prediction analysis filter for generating a prediction residual signal from the further decoded first spectral portion or a pre-emphasized further decoded first signal portion and for feeding the prediction residual signal into a codebook synthesizer of the second decoding processor;
ora switch for feeding the further decoded first signal portion or an output of the de-emphasis stage of the second decoding processor into an analysis stage of a resampler of the second decoding processor for initialization, wherein at least one of the first decoding processor, the spectral decoder, the frequency-time converter, the second decoding processor, the time domain low band decoder, the upsampler, the time domain bandwidth extension decoder, the mixer, the combiner, the cross-processor, the delay stage, the pre-emphasis filter, the prediction analysis filter, and the switch is implemented, at least in part, by a hardware element of the audio decoder.
-
-
22. A method of decoding an encoded audio signal to obtain a decoded audio signal, comprising:
-
first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising; decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion; second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises; decoding to obtain a low band time domain signal; upsampling the low band time domain signal to obtain an upsampled low band time domain signal; synthesizing a high band of a time domain output signal; and mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal; combining the decoded audio signal portion and the decoded second spectral portion to acquire the decoded audio signal; and performing a cross-processing procedure for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding, so that the second decoding is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal; wherein the cross-processing procedure comprises; delaying a further decoded first signal portion and feeding a delayed version of the further decoded first signal portion into a de-emphasis stage of the second decoding for initialization; filtering and delaying the further decoded first signal portion and feeding a delayed output into a prediction synthesis filter of the second decoding for initialization; generating a prediction residual signal from the further decoded first spectral portion or a pre-emphasized further decoded first signal portion and feeding the prediction residual signal into a codebook synthesizer of the second decoding;
orfeeding the further decoded first signal portion or an output of a de-emphasis stage of the second decoding into an analysis stage of a resampler of the second decoding for initialization, wherein one or more of the first decoding, the decoding the first spectral portions with the high spectral resolution, the converting, the second decoding, the decoding to obtain the low band time domain signal, the upsampling, the synthesizing, the mixing, the combining, the cross-processing procedure, the delaying a further decoded first signal portion and feeding a delayed version of the further decoded first signal portion, the filtering and delaying the further decoded first signal portion and feeding a delayed output, the generating a prediction residual signal from the further decoded first spectral portion or a pre-emphasized further decoded first signal portion and feeding the prediction residual signal, and the feeding the further decoded first signal portion or an output of a de-emphasis stage of the second decoding is implemented, at least in part, by one or more hardware elements of an audio signal processing device.
-
-
23. A non-transitory digital storage medium having stored thereon a computer program for performing, when running on a computer, a method of decoding an encoded audio signal to obtain a decoded audio signal, the method comprising:
-
first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising; decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the decoded spectral representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion; second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises; decoding to obtain a low band time domain signal; upsampling the low band time domain signal to obtain an upsampled low band time domain signal; synthesizing a high band of a time domain output signal; and mixing a synthesized high band of the time domain output signal and the upsampled low band time domain signal; combining the decoded first audio signal portion and the decoded second audio signal portion to acquire the decoded audio signal and performing a cross-processing procedure for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding, so that the second decoding is initialized to decode the second encoded audio signal portion following in time the first audio signal portion in the encoded audio signal; wherein the cross-processing procedure comprises; delaying a further decoded first signal portion and feeding a delayed version of the further decoded first signal portion into a de-emphasis stage of the second decoding for initialization; filtering and delaying the further decoded first signal portion and feeding a delayed output into a prediction synthesis filter of the second decoding for initialization; generating a prediction residual signal from the further decoded first spectral portion or a pre-emphasized further decoded first signal portion and feeding the prediction residual signal into a codebook synthesizer of the second decoding;
orfeeding the further decoded first signal portion or an output of a de-emphasis stage of the second decoding into an analysis stage of a resampler of the second decoding for initialization.
-
Specification