METHOD AND DEVICE FOR APPLYING DYNAMIC RANGE COMPRESSION TO A HIGHER ORDER AMBISONICS SIGNAL

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
3
Assignments
First Claim
1. A method for dynamic range compression (DRC), the method comprising:
 receiving a reconstructed Higher Order Ambisonics (HOA) audio signal representation;
transforming the reconstructed HOA audio signal into a spatial domain based on;
W_{DSHT}=D_{DSHT}C, wherein D_{DSHT }is an inverse Discrete Spherical Harmonics Transform (DSHT) matrix, wherein C is a block of τ
HOA samples, andwherein W is a block of spatial samples matching an input time granularity of a Quadrature Mirror Filter (QMF) bank;
applying a DRC gain value g(n, m) corresponding to a time frequency tile (n, m) based on;
{hacek over (w)}_{DRC}(n,m)=diag(g(n,m))ŵ
_{DSHT}(n,m),wherein ŵ
_{DSHT}(n, m) is a vector of spatial channels for the time frequency tile (n, m); and
rendering to loudspeaker channels based on;
w(n, m)=D D_{DSHT}^{−1 }{hacek over (w)}_{DRC}(n, m), wherein D_{DSHT}^{−1 }matrix is an inverse of the D_{DSHT }matrix and D is a HOA rendering matrix,wherein the D_{DSHT}^{−1 }the D_{DSHT }matrices are optimized for DRC purposes.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for performing DRC on a HOA signal comprises transforming the HOA signal to the spatial domain, analyzing the transformed HOA signal, and obtaining, from results of said analyzing, gain factors that are usable for dynamic compression. The gain factors can be transmitted together with the HOA signal. When applying the DRC, the HOA signal is transformed to the spatial domain, the gain factors are extracted and multiplied with the transformed HOA signal in the spatial domain, wherein a gain compensated transformed HOA signal is obtained. The gain compensated transformed HOA signal is transformed back into the HOA domain, wherein a gain compensated HOA signal is obtained. The DRC may be applied in the QMFfilter bank domain.
0 Citations
No References
No References
3 Claims
 1. A method for dynamic range compression (DRC), the method comprising:
receiving a reconstructed Higher Order Ambisonics (HOA) audio signal representation; transforming the reconstructed HOA audio signal into a spatial domain based on; W_{DSHT}=D_{DSHT}C, wherein D_{DSHT }is an inverse Discrete Spherical Harmonics Transform (DSHT) matrix, wherein C is a block of τ
HOA samples, andwherein W is a block of spatial samples matching an input time granularity of a Quadrature Mirror Filter (QMF) bank; applying a DRC gain value g(n, m) corresponding to a time frequency tile (n, m) based on;
{hacek over (w)}_{DRC}(n,m)=diag(g(n,m))ŵ
_{DSHT}(n,m),wherein ŵ
_{DSHT}(n, m) is a vector of spatial channels for the time frequency tile (n, m); andrendering to loudspeaker channels based on; w(n, m)=D D_{DSHT}^{−1 }{hacek over (w)}_{DRC}(n, m), wherein D_{DSHT}^{−1 }matrix is an inverse of the D_{DSHT }matrix and D is a HOA rendering matrix, wherein the D_{DSHT}^{−1 }the D_{DSHT }matrices are optimized for DRC purposes.  View Dependent Claims (3)
 2. An apparatus for dynamic range compression (DRC), the apparatus comprising:
a receiver for receiving a reconstructed Higher Order Ambisonics (HOA) audio signal representation; an audio decoder configured to; transform the reconstructed HOA audio signal into a spatial domain based on; W_{DSHT}=D_{DSHT}C, wherein D_{DSHT }is an inverse Discrete Spherical Harmonics Transform (DSHT) matrix, wherein C is a block of τ
HOA samples, and wherein W is a block of spatial samples matching an input time granularity of a Quadrature Mirror Filter (QMF) bank;apply a DRC gain value g(n, m) corresponding to a time frequency tile (n, m) based on; {hacek over (w)}_{DRC}(n, m)=diag(g(n, m)) ŵ
_{DSHT }(n, m), wherein ŵ
_{DSHT}(n, m) is a vector of spatial channels for the time frequency tile (n, m); andrendering to loudspeaker channels based on w(n, m)=D D_{DSHT}^{−1 }{hacek over (w)}_{DRC}(n, m), wherein D_{DSHT}^{−1 }matrix is an inverse of the D_{DSHT }matrix and D is a HOA rendering matrix, wherein the D_{DSHT}^{−1 }and the D_{DSHT }matrices are optimized for DRC purposes.
1 Specification
This application is division of U.S. patent application Ser. No. 15/891,326, filed Feb. 7, 2018, which is division of U.S. patent application Ser. No. 15/127,775, filed Sep. 20, 2016, now U.S. Pat. No. 9,936,321, which is U.S. National Stage of International Application No. PCT/EP2015/056206, filed Mar. 24, 2015, which claims priority to European Application No. 14305559.8, filed Apr. 15, 2014 and European Patent Application No. 14305423.7, filed Mar. 24, 2014, each of which is incorporated by reference in its entirety.
This invention relates to a method and a device for performing Dynamic Range Compression (DRC) to an Ambisonics signal, and in particular to a Higher Order Ambisonics (HOA) signal.
The purpose of Dynamic Range Compression (DRC) is to reduce the dynamic range of an audio signal. A timevarying gain factor is applied to the audio signal. Typically, this gain factor is dependent on the amplitude envelope of the signal used for controlling the gain. The mapping is in general nonlinear. Large amplitudes are mapped to smaller ones while faint sounds are often amplified. Scenarios are noisy environments, late night listening, small speakers or mobile headphone listening.
A common concept for streaming or broadcasting Audio is to generate the DRC gains before transmission and apply these gains after receiving and decoding. The principle of using DRC, i.e. how DRC is usually applied to an audio signal, is shown in
For 3D audio, different gains can be applied to loudspeaker channels that represent different spatial positions. These positions then need to be known at the sending side in order to be able to generate a matching set of gains. This is usually only possible for idealized conditions, while in realistic cases the number of speakers and their placement vary in many ways. This is more influenced from practical considerations than from specifications. Higher Order Ambisonics (HOA) is an audio format allows for flexible rendering. A HOA signal is composed of coefficient channels that do not directly represent sound levels. Therefore, DRC cannot be simply applied to HOA based signals.
The present invention solves at least the problem of how DRC can be applied to HOA signals. A HOA signal is analyzed in order to obtain one or more gain coefficients. In one embodiment, at least two gain coefficients are obtained, and the analysis of the HOA signal comprises a transformation into the spatial domain (iDSHT). The one or more gain coefficients are transmitted together with the original HOA signal. A special indication can be transmitted to indicate if all gain coefficients are equal. This is the case in a socalled simplified mode, whereas at least two different gain coefficients are used in a nonsimplified mode. At the decoder, the one or more gains can (but need not) be applied to the HOA signal. The user has a choice whether or not to apply the one or more gains. An advantage of the simplified mode is that it requires considerably less computations, since only one gain factor is used, and since the gain factor can be applied to the coefficient channels of the HOA signal directly in the HOA domain, so that the transform into the spatial domain and subsequent transform back into the HOA domain can be skipped. In the simplified mode, the gain factor is obtained by analysis of only the zeroth order coefficient channel of the HOA signal.
According to one embodiment of the invention, a method for performing DRC on a HOA signal comprises transforming the HOA signal to the spatial domain (by an inverse DSHT), analyzing the transformed HOA signal and obtaining, from results of said analyzing, gain factors that are usable for dynamic range compression. In further steps, the obtained gain factors are multiplied (in the spatial domain) with the transformed HOA signal, wherein a gain compressed transformed HOA signal is obtained. Finally, the gain compressed transformed HOA signal is transformed back into the HOA domain (by a DSHT), i.e. coefficient domain, wherein a gain compressed HOA signal is obtained.
Further, according to one embodiment of the invention, a method for performing DRC in a simplified mode on a HOA signal comprises analyzing the HOA signal and obtaining from results of said analyzing a gain factor that is usable for dynamic range compression. In further steps, upon evaluation of the indication, the obtained gain factor is multiplied with coefficient channels of the HOA signal (in the HOA domain), wherein a gain compressed HOA signal is obtained. Also upon evaluation of the indication, it can be determined that a transformation of the HOA signal can be skipped. The indication to indicate simplified mode, i.e. that only one gain factor is used, can be set implicitly, e.g. if only simplified mode can be used due to hardware or other restrictions, or explicitly, e.g. upon user selection of either simplified or nonsimplified mode.
Further, according to one embodiment of the invention, a method for applying DRC gain factors to a HOA signal comprises receiving a HOA signal, an indication and gain factors, determining that the indication indicates nonsimplified mode, transforming the HOA signal into the spatial domain (using an inverse DSHT), wherein a transformed HOA signal is obtained, multiplying the gain factors with the transformed HOA signal, wherein a dynamic range compressed transformed HOA signal is obtained, and transforming the dynamic range compressed transformed HOA signal back into the HOA domain (i.e. coefficient domain) (using a DSHT), wherein a dynamic range compressed HOA signal is obtained. The gain factors can be received together with the HOA signal or separately.
Further, according to one embodiment of the invention, a method for applying a DRC gain factor to a HOA signal comprises receiving a HOA signal, an indication and a gain factor, determining that the indication indicates simplified mode, and upon said determining multiplying the gain factor with the HOA signal, wherein a dynamic range compressed HOA signal is obtained. The gain factors can be received together with the HOA signal or separately.
In one embodiment, the invention provides a computer readable medium having executable instructions to cause a computer to perform a method for applying DRC gain factors to a HOA signal, comprising steps as described above.
In one embodiment, the invention provides a computer readable medium having executable instructions to cause a computer to perform a method for performing DRC on a HOA signal, comprising steps as described above.
In one embodiment methods, apparatus and computer readable medium may be configured to perform the following methods for dynamic range compression (DRC). The methods may apply DRC in a Quadrature Mirror Filter (QMF)filter bank domain. This may include receiving a Higher Order Ambisonics (HOA) audio representation and a gain value g(n, m) corresponding to a time frequency tile (n, m) and applying the gain value and a Discrete Spherical Harmonics Transform (DSHT) matrix to the HOA audio representation. The gain value is applied based on {hacek over (w)}_{DRC}(n, m)=diag(g(n, m)) ŵ_{DSHT }(n, m), where ŵ_{DSHT}(n, m) is a vector of spatial channels for the time frequency tile (n, m), and n the vector ŵ_{DSHT}(n, m) is determined based on an application of the DSHT matrix to HOA audio representation. The method may further combine the DSHT matrix and rendering to loudspeaker channels based on w(n, m)=D D_{DSHT}^{−1 }{hacek over (w)}_{DRC }(n, m) wherein D_{DSHT}^{−1 }is an inverse of the DSHT matrix and D is a HOA rendering matrix.
Advantageous embodiments of the invention are disclosed in the dependent claim, the following description and the figures.
Exemplary embodiments of the invention are described with reference to the accompanying drawings:
The present invention describes how DRC can be applied to HOA. This is conventionally not easy because HOA is a sound field description.
On the decoding or receiving side, as shown in
In the following, used assumptions and definitions are explained. Assumptions are that the HOA renderer is energy preserving, i.e. N3D normalized Spherical Harmonics are used, and the energy of a single directional signal coded inside the HOA representation is maintained after rendering. It is described e.g. in WO2015/007889A_{(PD130040) }how to achieve this energy preserving HOA rendering.
Definitions of used terms are as follows.
B∈^{(N+1)}^{2}^{×τ }denotes a block of τ HOA samples, B=[b(1), b(2), . . . , b(t), . . . , b(τ)], with vector b(t)=[b_{1}, b_{2}, . . . b_{o}, . . . b_{(N+1)}_{2}]^{T}=[B_{0}^{0}, B_{1}^{−1}, . . . B_{n}^{m}, . . . B_{N}^{N}]^{T }which contains the Ambisonics coefficients in ACN order (vector index o=n^{2}+n+m+1, with coefficient order index n and coefficient degree index m). N denotes the HOA truncation order. The number of higher order coefficients in b is (N+1)^{2}. The sample index for one block of data is t. τ may range from usually one sample to 64 samples or more.
The zeroth order signal _{o}=[b_{1}(1), b_{1}(2), . . . , b_{1}(τ)] is the first row of B. D∈^{L×(N+1)}^{2 }denotes an energy preserving rendering matrix that renders a block of HOA samples to a block of L loudspeaker channel in spatial domain: W=DB, with W∈^{L×τ}. This is the assumed procedure of the HOA renderer in
D_{L}∈^{(N+1)}^{2}^{×(N+1)}^{2 }denotes a rendering matrix related to L_{L}=(N+1)^{2 }channels which are positioned on a sphere in a very regular manner, in a way that all neighboring positions share the same distance. D_{L }is wellconditioned and its inverse D_{L}^{−1 }exists. Thus, both define a pair of transformation matrices (DSHT—Discrete Spherical Harmonics Transform):
W
_{L}
=D
_{L}
B, B=D
_{L}
^{−1}
W
_{L }
g is a vector of L_{L}=(N+1)^{2 }gain DRC values. Gain values are assumed to be applied to a block of τ samples and are assumed to be smooth from block to block. For transmission, gain values that share the same values can be combined to gaingroups. If only a single gaingroup is used, this means that a single DRC gain value, here indicated by g_{1}, is applied to all speaker channel τ samples.
For every HOA truncation order N, an ideal L_{L}=(N+1)^{2 }virtual speaker grid and related rendering matrix D_{L }are defined. The virtual speaker positions sample spatial areas surrounding a virtual listener. The grids for N=1 to 6 are shown in
The HOA signal is converted to the spatial domain by W_{L}=D_{L}B. Up to L_{L}=(N+1)^{2 }DRC gains g_{l }are created by analyzing these signals. If the content is a combination of HOA and Audio Objects (AO), AO signals such as e.g. dialog tracks may be used for side chaining. This is shown in
In
A variable number of 1 to L_{L}=(N+1)^{2 }gain values related to a block of τ samples is transmitted. Gain values can be assigned to channel groups for transmission. In an embodiment, all equal gains are combined in one channel group to minimize transmission data. If a single gain is transmitted, it is related to all L_{L }channels. Transmitted are the channel groups gain values g_{l}_{g }and their number. The usage of channel groups is signaled, so that the receiver or decoder can apply the gain values correctly.
The receiver/decoder can determine the number of transmitted coded gain values, decode 51 related information and assign 5255 the gains to L_{L}=(N+1)^{2 }channels. If only one gain value (one channel group) is transmitted, it can be directly applied 52 to the HOA signal (B_{DRC}=g_{1 }B), as shown in
If two or more gains are transmitted, the channel group gains are assigned to L channel gains g=[g_{1}, . . . , g_{L}] each.
For the virtual regular loudspeaker grid, the loudspeaker signals with the DRC gains applied are computed by
Ŵ_{L}=diag(g)·W_{L}.
The resulting modified HOA representation is then computed by
B_{DRC}=D_{L}^{−1}Ŵ_{L}.
This can be simplified, as shown in
G=D_{L}^{−1 }diag(g)D_{L},
with ∈^{(N+1)}^{2}^{×(N+1)}^{2}. The gain matrix is applied directly to the HOA coefficients in a gain assignment block 54: B_{DRC}=GB.
This is more efficient in terms of computational operations needed for (N+1)^{2}<τ. That is, this solution has an advantage over conventional solutions because the decoding is much simpler and requires considerably less processing. The reason is that no matrix operations are required; instead, the gain values can be applied directly, e.g. multiplied with the HOA coefficients in the gain assignment block 54.
In one embodiment, an even more efficient way of applying the gain matrix is to manipulate in a Renderer matrix modification block 57 the Renderer matrix by {circumflex over (D)}=DG, apply the DRC and render the HOA signal in one step: W={circumflex over (D)}B. This is shown in
In summary,
In
In
In the following, calculation of ideal DSHT (Discrete Spherical Harmonics Transform) matrices for DRC is described. Such DSHT matrices are particularly optimized for usage in DRC and are different from DSHT matrices used for other purpose, e.g. data rate compression.
The requirements for the ideal rendering and encoding matrices D_{L }and D_{L}^{−1 }related to an ideal spherical layout are derived below. Finally, these requirements are the following:
(1) the rendering matrix D_{L }must be invertible, that is, D_{L}^{−1 }needs to exist;
(2) the sum of amplitudes in the spatial domain should be reflected as the zeroth order HOA coefficients after spatial to HOA domain transform, and should be preserved after a subsequent transform to the spatial domain (amplitude requirement); and
(3) the energy of the spatial signal should be preserved when transforming to the HOA domain and back to the spatial domain (energy preservation requirement). Even for ideal rendering layouts, requirement 2 and 3 seem to be in contradiction to each other. When using a simple approach to derive the DSHT transform matrices, such as those known from the prior art, only one or the other of requirements (2) and (3) can be fulfilled without error. Fulfilling one of the requirements (2) and (3) without error results in errors exceeding 3 dB for the other one. This usually leads to audible artifacts. A method to overcome this problem is described in the following.
First, an ideal spherical layout with L=(N+1)^{2 }is selected. The L directions of the (virtual) speaker positions are given by Ω_{l }and the related mode matrix is denoted as Ψ_{L}=[φ(Ω_{1}), . . . , φ(Ω_{l}), φ(Ω_{L})]^{T}. Each φ(Ω_{l}) is a mode vector containing the spherical harmonics of the direction Ω_{l}. L quadrature gains related to the spherical layout positions are assembled in vector . These quadrature gains rate the spherical area around such positions and all sum up to a value of 4π related to the surface of a sphere with a radius of one.
A first prototype rendering matrix {tilde over (D)}_{L }is derived by
Note that the division by L can be omitted due to a later normalization step (see below).
Second, a compact singular value decomposition is performed: {tilde over (D)}_{L}=USV^{T }and a second prototype matrix is derived by
{circumflex over ({tilde over (D)})}_{L}=UV^{T}.
Third, the prototype matrix is normalized:
where k denotes the matrix norm type. Two matrix norm types show equally good performance. Either the k=1 norm or the Frobenius norm should be used. This matrix fulfills the requirement 3 (energy preservation).
Fourth, in the last step the Amplitude error to fulfill requirement 2 is substituted: Rowvector e is calculated by
where [1,0,0, . . . ,0] is a row vector of (N+1)^{2 }all zero elements except for the first element with a value of one. 1_{L}^{T}Ď_{L }denotes the sum of rows vectors of Ď_{L}. The rendering matrix D_{L }is now derived by substituting the amplitude error:
D_{L}=Ď_{L}+[e^{T},e^{T},e^{T}, . . . ]^{T},
where vector e is added to every row of Ď_{L}. This matrix fulfills requirement 2 and requirement 3. The first row elements of D_{L}^{−1 }all become one.
In the following, detailed requirements for DRC are explained.
First, L_{L }identical gains with a value of g_{1 }applied in spatial domain is equal to apply the gain g_{1 }to the HOA coefficients:
D
_{L}
^{−1}
gW
_{L}
=D
_{L}
^{−1}
g
_{1}
ID
_{L}
B=g
_{1}
D
_{L}
^{−1}
D
_{L}
B=g
_{1}
B
This leads to the requirement: D_{L}^{−1 }D_{L}=I, which means that L=(N+1)^{2 }and D_{L}^{−1 }needs to exist (trivial).
Second, analyzing the sum signal in spatial domain is equal to analyzing the zeroth order HOA component. DRC analyzers use the signals'"'"' energy as well as its amplitude. Thus, the sum signal is related to amplitude and energy.
The signal model of HOA: B=Ψ_{e }X_{s}, X_{s}∈^{S×τ }is a matrix of S directional signals; Ψ_{e}=[φ(Ω_{1}), . . . , φ(Ω_{s}), φ(Ω_{S})] is a N3D mode matrix related to the directions Ω_{1}, . . . , Ω_{S}. The mode vector φ(Ω_{s})=[Y_{0}^{0}(Ω_{s}), Y_{1}^{−1}(Ω_{s}), . . . Y_{N}^{N}(Ω_{s})]^{T }is assembled out of Spherical Harmonics. In N3D notation the zeroth order component Y_{0}^{0}(Ω_{s})=1 is independent of the direction.
The zeroth order component HOA signal needs to become the sum of the directional signals _{o}=[b_{1}(1), b_{1}(2), . . . , b_{1}(T)]=1_{S}^{T}X_{s }to reflect the correct amplitude of the summation signal. 1_{S }is a vector assembled out of S elements with a value of 1.
The energy of the directional signals is preserved in this mix because _{o}_{o}^{T}=1_{S}^{T}X_{s}X_{s}^{T }1_{S}. This would simplify to Σ_{s=1}^{S }Σ_{t=1}^{τ }X_{s,t}^{2}=∥X_{s}∥_{fro}^{2 }if the signals X_{s }are not correlated.
The sum of amplitudes in spatial domain is given by 1_{L}^{T}W_{L}=1_{L}^{T}D_{L }Ψ_{e }X_{s}=1_{L}^{T }M_{L}X_{s }with HOA panning matrix M_{L}=D_{L }Ψ_{e}.
This becomes _{o}=1_{S}^{T}X_{s }for 1_{L}^{T }M_{L}=1_{L}^{T}D_{L }Ψ_{e}=1_{S}^{T}. The latter requirement can be compared to the sum of amplitudes requirement sometimes used in panning like VBAP. Empirically it can be seen that this can be achieved in good approximation for very symmetric spherical speaker setups with D_{L}=Ψ_{e}^{−1}, because there we find: 1_{L}^{T}D_{L}≈[1,0,0, . . . ,0]⇒1_{L}^{T}D_{L }Ψ_{e}[Y_{0}^{0}(Ω_{1}), . . . Y_{0}^{0}(Ω_{s})]=1_{S}^{T}. The Amplitude requirement can then be reached within necessary accuracy.
This also ensures that the energy requirement for the sum signal can be met:
The energy sum in spatial domain is given by: 1_{L}^{T}W_{L }W_{L}^{T }1_{L}=1_{L}^{T }M_{L}X_{s }X_{s}^{T }M_{L }1_{L }which would become in good approximation 1_{S}^{T}X_{s}X_{s}^{T }1_{S}, the existence of an ideal symmetric speaker setup required.
This leads to the requirement: 1_{L}^{T}D_{L}≅[1,0,0, . . . ,0] and in addition from the signal model we can conclude that the top row of D_{L}^{−1 }needs to be [1,1,1,1, . . . ], i.e. a vector of length L with “one” elements) in order that the reencoded order zero signal maintains amplitude and energy.
Third, energy preservation is a prerequisite: The energy of signal x_{s}∈^{1×τ }should be preserved after conversion to HOA and spatial rendering to loud speakers independent of the signal'"'"'s direction Ω_{s}. This leads to ∥D_{L }φ(Ω_{s})_{2}^{2}=1. This can be achieved by modelling D_{L }from rotation matrices and a diagonal gain matrix: D_{L}=UV^{T }diag(a) (the dependency on the direction (Ω_{s}) was removed for clarity): ∥D_{L }φ∥_{2}^{2}=φ^{T}D_{L}^{T}D_{L}φ=φ^{T }diag(a)VU^{T}UV^{T }diag(a)φ=φ^{T }diag(a)^{2}φ=Σ_{o=1}^{(N+1)}^{2 }a_{o}^{2}φ_{o}^{2}≡1
For Spherical harmonics φ_{o}^{2}=Y_{n}^{m2}(Ω_{s})=1, so all gains a_{o}^{2 }related to ∥D_{L}∥_{fro}^{2}=Σ_{o=1}^{(N+1)}^{2 }a_{o}^{2}=1 would satisfy the equation. If all gains are selected equal, this leads to a_{o}^{2}=(N+1)^{−2}.
The requirement VV^{T}=1 can be achieved for L≥(N+1)^{2 }and only be approximated for L<(N+1)^{2}).
This leads to the requirement: D_{L}^{T }D_{L}=diag(a)^{2 }with Σ_{o=1}^{(N+1)}^{2 }a_{o}^{2}=1.
As an example, a case with ideal spherical positions (for HOA orders N=1 to N=3) is described in the following (Tabs.13). Ideal spherical positions for further HOA orders (N=4 to N=6) are described further below (Tabs.46). All the belowmentioned positions are derived from modified positions published in [1]. The method to derive these positions and related quadrature/cubature gains was published in [2]. In these tables, the azimuth is measured counterclockwise from frontal direction related to the listening position and the inclination is measured from the zaxis with an inclination of 0 being above the listening position.
The term numerical quadrature is often abbreviated to quadrature and is quite a synonym for numerical integration, especially as applied to 1dimensional integrals. Numerical integration over more than one dimension is called cubature herein.
Typical application scenarios to apply DRC gains to HOA signals are shown in
In
In the following, further details of the disclosed solution are described.
DRC for HOA Content
DRC is applied to the HOA signal before rendering, or may be combined with rendering. DRC for HOA can be applied in the time domain or in the QMFfilter bank domain.
For DRC in the Time Domain, the DRC decoder provides (N+1)^{2 }gain values g_{drc}=[g_{1}, . . . , g_{(N+1)}_{2}]^{T }according to the number of HOA coefficient channels of the HOA signal c. N is the HOA order.
DRC gains are applied to the HOA signals according to:
c_{drc}=D_{L}^{−1 }diag(g_{drc})D_{L}c
where c is a vector of one time sample of HOA coefficients (c∈^{(N+1)}^{2}^{×1}), and D_{L}∈^{(N+1)}^{2}^{×(N+1)}^{2 }and its inverse D_{L}^{−1 }are matrices related to a Discrete Spherical Harmonics Transform (DSHT) optimized for DRC purposes.
In one embodiment, it can be advantageous for decreasing the computational load by (N+1)^{4 }operations per sample, to include the rendering step and calculate the loudspeaker signals directly by: w_{drc}=(D D_{L}^{−1}) (diag(g_{drc})D_{L}) c, where D is the rendering matrix and (D D_{L}^{−1}) can be precomputed.
If all gains g_{1}, . . . , g_{(N+1)}_{2 }have the same value of g_{drc}, as in the simplified mode, a single gain group has been used to transmit the coder DRC gains. This case can be flagged by the DRC decoder, because in this case the calculation in the spatial filter is not needed, so that the calculation simplifies to:
c
_{drc}
=g
_{drc}
c.
The above describes how to obtain and apply the DRC gain values. In the following, the calculation of DSHT matrices for DRC is described.
In the following, a is renamed to D_{DSHT}. The matrices to determine the spatial filter D_{DSHT }and its inverse D_{DSHT}^{−1 }are calculated as follows:
A set of spherical positions _{DSHT}=[Ω_{1}, Ω_{l}, . . . , Ω_{(N+1)}_{2}] with Ω_{l}=[θ_{l}, ϕ_{l}]^{T }and related quadrature (cubature) gains ∈^{(N+1)}^{2}^{×1 }are selected, indexed by the HOA order N from Tables 14. A mode matrix Ψ_{DSHT }related to these positions is calculated as described above. That is, the mode matrix Ψ_{DSHT }comprises mode vectors according to Ψ_{DSHT}=[φ(Ω_{1}), . . . , φ(Ω_{l}), φ(Ω_{(N+1)}_{2})] with each φ(Ω_{l}) being a mode vector that contains spherical harmonics of a predefined direction Ω_{l }with Ω_{l}=[θ_{l}, ϕ_{l}]^{T}. The predefined direction depends on the HOA order N, according to Tab.16 (exemplarily for 1≤N≤6). A first prototype matrix is calculated by
(the division by (N+1)^{2 }can be skipped due to a subsequent normalization). A compact singular value decomposition is performed {tilde over (D)}_{1}=USV^{T }and a new prototype matrix is calculated by: {circumflex over ({tilde over (D)})}_{2}=UV^{T}. This matrix is normalized by:
A rowvector e is calculated by
where [1,0,0, . . . ,0] is a row vector of (N+1)^{2 }all zero elements except for the first element with a value of one. 1_{L}^{T}Ď_{2 }denotes the sum of rows of Ď_{2}. The optimized DSHT matrix D_{DSHT }is now derived by: D_{DSHT}=Ď_{2}+[e^{T}, e^{T}, e^{T}, . . . ]^{T}It has been found that, if −e is used instead of e, the invention provides slightly worse but still usable results.
For DRC in the QMFfilter bank domain, the following applies.
The DRC decoder provides a gain value g_{ch}(n, m) for every time frequency tile n, m for (N+1)^{2 }spatial channels. The gains for time slot n and frequency band m are arranged in g(n, m)∈^{(N+1)}^{2}^{×1}.
Multiband DRC is applied in the QMF Filter bank domain. The processing steps are shown in
To minimize the computational complexity, the DSHT and rendering to loudspeaker channels are combined: w(n, m)=D D_{DSHT}^{−1 }{hacek over (w)}_{DRC}(n, m), where D denotes the HOA rendering matrix. The QMF signals then can be fed to the mixer for further processing.
If only a single gain group for DRC has been used this should be flagged by the DRC decoder because again computational simplifications are possible. In this case the gains in vector g(n, m) all share the same value of g_{DRC}(n, m). The QMF filter bank can be directly applied to the HOA signal and the gain g_{DRC}(n, m) can be multiplied in filter bank domain.
As has become apparent in view of the above, in one embodiment the invention relates to a method for applying Dynamic Range Compression gain factors to a HOA signal, the method comprising steps of receiving a HOA signal and one or more gain factors, transforming 40 the HOA signal into the spatial domain, wherein an iDSHT is used with a transform matrix obtained from spherical positions of virtual loudspeakers and quadrature gains q, and wherein a transformed HOA signal is obtained, multiplying the gain factors with the transformed HOA signal, wherein a dynamic range compressed transformed HOA signal is obtained, and transforming the dynamic range compressed transformed HOA signal back into the HOA domain being a coefficient domain and using a Discrete Spherical Harmonics Transform (DSHT), wherein a dynamic range compressed HOA signal is obtained.
Further, the transform matrix is computed according to D_{DSHT}=Ď_{2}+[e^{T}, e^{T}, e^{T}, . . . ]^{T }wherein
is a normalized version of {circumflex over ({tilde over (D)})}_{2}=UV^{T }with U,V obtained from {tilde over (D)}_{1}=
with Ψ_{DSHT }being the transposed mode matrix of spherical harmonics related to the used spherical positions of virtual loudspeakers, and e^{T }being a transposed version of
Further, in one embodiment the invention relates to a device for applying DRC gain factors to a HOA signal, the device comprising a processor or one or more processing elements adapted for receiving a HOA signal and one or more gain factors, transforming 40 the HOA signal into the spatial domain, wherein an iDSHT is used with a transform matrix obtained from spherical positions of virtual loudspeakers and quadrature gains q, and wherein a transformed HOA signal is obtained, multiplying the gain factors with the transformed HOA signal, wherein a dynamic range compressed transformed HOA signal is obtained, and transforming the dynamic range compressed transformed HOA signal back into the HOA domain being a coefficient domain and using a Discrete Spherical Harmonics Transform (DSHT), wherein a dynamic range compressed HOA signal is obtained. Further, the transform matrix is computed according to D_{DSHT}=Ď_{2}+[e^{T}, e^{T}, e^{T}, . . . ]^{T }wherein
is a normalized version of {circumflex over ({tilde over (D)})}_{2}=UV^{T }with U,V obtained from
with Ψ_{DSHT }being the transposed mode matrix of the spherical harmonics related to the used spherical positions of virtual loudspeakers, and e^{T }being a transposed version of
Further, in one embodiment the invention relates to a computer readable storage medium having computer executable instructions that when executed on a computer cause the computer to perform a method for applying Dynamic Range Compression gain factors to a Higher Order Ambisonics (HOA) signal, the method comprising receiving a HOA signal and one or more gain factors, transforming 40 the HOA signal into the spatial domain, wherein an iDSHT is used with a transform matrix obtained from spherical positions of virtual loudspeakers and quadrature gains q, and wherein a transformed HOA signal is obtained, multiplying the gain factors with the transformed HOA signal, wherein a dynamic range compressed transformed HOA signal is obtained, and transforming the dynamic range compressed transformed HOA signal back into the HOA domain being a coefficient domain and using a Discrete Spherical Harmonics Transform (DSHT), wherein a dynamic range compressed HOA signal is obtained. Further, the transform matrix is computed according to D_{DSHT}=Ď_{2}+[e^{T}, e^{T}, e^{T}, . . . ]^{T }wherein
is a normalized version of {circumflex over ({tilde over (D)})}_{2}=UV^{T }with U,V obtained from
with Ψ_{DSHT }being the transposed mode matrix of spherical harmonics related to the used spherical positions of virtual loudspeakers, and e^{T }being a transposed version of
Further, in one embodiment the invention relates to a method for performing DRC on a HOA signal, the method comprising steps of setting or determining a mode, the mode being either a simplified mode or a nonsimplified mode, in the nonsimplified mode, transforming the HOA signal to the spatial domain, wherein an inverse DSHT is used, in the nonsimplified mode, analyzing the transformed HOA signal, and in the simplified mode, analyzing the HOA signal, obtaining, from results of said analyzing, one or more gain factors that are usable for dynamic range compression, wherein only one gain factor is obtained in the simplified mode and wherein two or more different gain factors are obtained in the nonsimplified mode, in the simplified mode multiplying the obtained gain factor with the HOA signal, wherein a gain compressed HOA signal is obtained, in the nonsimplified mode, multiplying the obtained gain factors with the transformed HOA signal, wherein a gain compressed transformed HOA signal is obtained, and transforming the gain compressed transformed HOA signal back into the HOA domain, wherein a gain compressed HOA signal is obtained.
In one embodiment, the method further comprises steps of receiving an indication indicating either a simplified mode or a nonsimplified mode, selecting a nonsimplified mode if said indication indicates nonsimplified mode, and selecting a simplified mode if said indication indicates simplified mode, wherein the steps of transforming the HOA signal into the spatial domain and transforming the dynamic range compressed transformed HOA signal back into the HOA domain are performed only in the nonsimplified mode, and wherein in the simplified mode only one gain factor is multiplied with the HOA signal.
In one embodiment, the method further comprises steps of, in the simplified mode analyzing the HOA signal, and in the nonsimplified mode analyzing the transformed HOA signal, then obtaining, from results of said analyzing, one or more gain factors that are usable for dynamic range compression, wherein in the nonsimplified mode two or more different gain factors are obtained and in the simplified mode only one gain factor is obtained, wherein in the simplified mode a gain compressed HOA signal is obtained by said multiplying the obtained gain factor with the HOA signal, and wherein in the nonsimplified mode said gain compressed transformed HOA signal is obtained by multiplying the obtained two or more gain factors with the transformed HOA signal, and wherein in the nonsimplified mode said transforming the HOA signal to the spatial domain uses an inverse DSHT.
In one embodiment, the HOA signal is divided into frequency subbands, and the gain factor(s) is (are) obtained and applied to each frequency subband separately, with individual gains per subband. In one embodiment, the steps of analyzing the HOA signal (or transformed HOA signal), obtaining one or more gain factors, multiplying the obtained gain factor(s) with the HOA signal (or transformed HOA signal), and transforming the gain compressed transformed HOA signal back into the HOA domain are applied to each frequency subband separately, with individual gains per subband. It is noted that the sequential order of dividing the HOA signal into frequency subbands and transforming the HOA signal to the spatial domain can be swapped, and/or the sequential order of synthesizing the subbands and transforming the gain compressed transformed HOA signals back into the HOA domain can be swapped, independently from each other.
In one embodiment, the method further comprises, before the step of multiplying the gain factors, a step of transmitting the transformed HOA signal together with the obtained gain factors and the number of these gain factors.
In one embodiment, the transform matrix is computed from a mode matrix Ψ_{DSHT }and corresponding quadrature gains, wherein the mode matrix Ψ_{DSHT }comprises mode vectors according to Ψ_{DSHT}=[φ(Ω_{1}), . . . , φ(Ω_{l}), φ(Ω_{(N+1)}_{2})] with each φ(Ω_{l}) being a mode vector containing spherical harmonics of a predefined direction Ω_{l }with Ω_{l}=[θ_{l}, ϕ_{l}]^{T}. The predefined direction depends on a HOA order N.
In one embodiment, the HOA signal B is transformed into the spatial domain to obtain a transformed HOA signal W_{DSHT}, and the transformed HOA signal W_{DSHT }is multiplied with the gain values diag(g) sample wise according to W_{DSHT}=diag(g) D_{L}B, and the method comprises a further step of transforming the transformed HOA signal to a different second spatial domain according to W_{2}={circumflex over (D)} W_{DSHT}, where D is precalculated in an initialization phase according to {circumflex over (D)}=D D_{L}^{−1 }and where D is a rendering matrix that transforms a HOA signal into the different second spatial domain.
In one embodiment, at least if (N+1)^{2}<τ, with N being the HOA order and τ being a DRC block size, the method further comprises steps of transforming 53 the gain vector to the HOA domain according to G=D_{L}^{−1 }diag(g) D_{L}, with G being a gain matrix and DL being a DSHT matrix defining said DSHT, and applying the gain matrix G to the HOA coefficients of the HOA signal B according to B_{DRC}=GB, wherein the DRC compressed HOA signal B_{DRC }is obtained.
In one embodiment, at least if L<τ, with L being the number of output channels and τ being a DRC block size, the method further comprises steps of applying the gain matrix G to the renderer matrix D according to {circumflex over (D)}=DG, wherein a dynamic range compressed renderer matrix {circumflex over (D)} is obtained, and rendering the HOA signal with the dynamic range compressed renderer matrix.
In one embodiment the invention relates to a method for applying DRC gain factors to a HOA signal, the method comprising steps of receiving a HOA signal together with an indication and one or more gain factors, the indication indicating either a simplified mode or a nonsimplified mode, wherein only one gain factor is received if the indication indicates the simplified mode, selecting either a simplified mode or a nonsimplified mode according to said indication, in the simplified mode multiplying the gain factor with the HOA signal, wherein a dynamic range compressed HOA signal is obtained, and in the nonsimplified mode transforming the HOA signal into the spatial domain, wherein a transformed HOA signal is obtained, multiplying the gain factors with the transformed HOA signals, wherein dynamic range compressed transformed HOA signals are obtained, and transforming the dynamic range compressed transformed HOA signals back into the HOA domain, wherein a dynamic range compressed HOA signal is obtained.
Further, in one embodiment the invention relates to a device for performing DRC on a HOA signal, the device comprising a processor or one or more processing elements adapted for setting or determining a mode, the mode being either a simplified mode or a nonsimplified mode, in the nonsimplified mode transforming the HOA signal to the spatial domain, wherein an inverse DSHT is used, in the nonsimplified mode analyzing the transformed HOA signal, while in the simplified mode analyzing the HOA signal, obtaining, from results of said analyzing, one or more gain factors that are usable for dynamic range compression, wherein only one gain factor is obtained in the simplified mode and wherein two or more different gain factors are obtained in the nonsimplified mode, in the simplified mode multiplying the obtained gain factor with the HOA signal, wherein a gain compressed HOA signal is obtained, and in the nonsimplified mode multiplying the obtained gain factors with the transformed HOA signal, wherein a gain compressed transformed HOA signal is obtained, and transforming the gain compressed transformed HOA signal back into the HOA domain, wherein a gain compressed HOA signal is obtained.
In one embodiment for nonsimplified mode only, a device for performing DRC on a HOA signal comprises a processor or one or more processing elements adapted for transforming the HOA signal to the spatial domain, analyzing the transformed HOA signal, obtaining, from results of said analyzing, gain factors that are usable for dynamic range compression, multiplying the obtained factors with the transformed HOA signals, wherein gain compressed transformed HOA signals are obtained, and transforming the gain compressed transformed HOA signals back into the HOA domain, wherein gain compressed HOA signals are obtained. In one embodiment, the device further comprises a transmission unit for transmitting, before multiplying the obtained gain factor or gain factors, the HOA signal together with the obtained gain factor or gain factors.
Also, here it is noted that the sequential order of dividing the HOA signal into frequency subbands and transforming the HOA signal to the spatial domain can be swapped, and the sequential order of synthesizing the subbands and transforming the gain compressed transformed HOA signals back into the HOA domain can be swapped, independently from each other.
Further, in one embodiment the invention relates to a device for applying DRC gain factors to a HOA signal, the device comprising a processor or one or more processing elements adapted for receiving a HOA signal together with an indication and one or more gain factors, the indication indicating either a simplified mode or a nonsimplified mode, wherein only one gain factor is received if the indication indicates the simplified mode, setting the device to either a simplified mode or a nonsimplified mode, according to said indication, in the simplified mode, multiplying the gain factor with the HOA signal, wherein a dynamic range compressed HOA signal is obtained; and in the nonsimplified mode, transforming the HOA signal into the spatial domain, wherein a transformed HOA signal is obtained, multiplying the gain factors with the transformed HOA signals, wherein dynamic range compressed transformed HOA signals are obtained, and transforming the dynamic range compressed transformed HOA signals back into the HOA domain, wherein a dynamic range compressed HOA signal is obtained.
In one embodiment, the device further comprises a transmission unit for transmitting, before multiplying the obtained factors, the HOA signals together with the obtained gain factors. In one embodiment, the HOA signal is divided into frequency subbands, and the analyzing the transformed HOA signal, obtaining gain factors, multiplying the obtained factors with the transformed HOA signals and transforming the gain compressed transformed HOA signals back into the HOA domain are applied to each frequency subband separately, with individual gains per subband.
In one embodiment of the device for applying DRC gain factors to a HOA signal, the HOA signal is divided into a plurality of frequency subbands, and obtaining one or more gain factors, multiplying the obtained gain factors with the HOA signals or the transformed HOA signals, and in the nonsimplified mode transforming the gain compressed transformed HOA signals back into the HOA domain are applied to each frequency subband separately, with individual gains per subband.
Further, in one embodiment where only the nonsimplified mode is used, the invention relates to a device for applying DRC gain factors to a HOA signal, the device comprising a processor or one or more processing elements adapted for receiving a HOA signal together with gain factors, transforming the HOA signal into the spatial domain (using iDSHT), wherein a transformed HOA signal is obtained, multiplying the gain factors with the transformed HOA signal, wherein a dynamic range compressed transformed HOA signal is obtained, and transforming the dynamic range compressed transformed HOA signal back into the HOA domain (i.e. coefficient domain) (using DSHT), wherein a dynamic range compressed HOA signal is obtained.
The following tables Tab.46 list spherical positions of virtual loudspeakers for HOA of order N with N=4, 5 or 6.
While there has been shown, described, and pointed out fundamental novel features of the present invention as applied to preferred embodiments thereof, it will be understood that various omissions and substitutions and changes in the apparatus and method described, in the form and details of the devices disclosed, and in their operation, may be made by those skilled in the art without departing from the spirit of the present invention. It is expressly intended that all combinations of those elements that perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Substitutions of elements from one described embodiment to another are also fully intended and contemplated.
It will be understood that the present invention has been described purely by way of example, and modifications of detail can be made without departing from the scope of the invention. Each feature disclosed in the description and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination. Features may, where appropriate be implemented in hardware, software, or a combination of the two.
 [1] “Integration nodes for the sphere”, Jörg Fliege 2010, online accessed 2010 Oct. 5 http://www.mathematik.unidortmund.de/lsx/research/projects/fliege/nodes/nodes.html
 [2] “A twostage approach for computing cubature formulae for the sphere”, Jörg Fliege and Ulrike Maier, Technical report, Fachbereich Mathematik, Universität Dortmund, 1999