BANDWIDTH EXTENSION METHOD, BANDWIDTH EXTENSION APPARATUS, PROGRAM, INTEGRATED CIRCUIT, AND AUDIO DECODING APPARATUS

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
0
Assignments
First Claim
121. 21. (canceled)
0 Assignments
0 Petitions
Accused Products
Abstract
To provide a bandwidth extension method which allows reduction of computation amount in bandwidth extension and suppression of deterioration of quality in the bandwidth to be extended. In the bandwidth extension method: a low frequency bandwidth signal is transformed into a QMF domain to generate a first low frequency QMF spectrum; pitchshifted signals are generated by applying different shifting factors on the low frequency bandwidth signal; a high frequency QMF spectrum is generated by timestretching the pitchshifted signals in the QMF domain; the high frequency QMF spectrum is modified; and the modified high frequency QMF spectrum is combined with the first low frequency QMF spectrum.
0 Citations
No References
No References
26 Claims
 121. 21. (canceled)
 22. A bandwidth extension method for producing a full bandwidth signal from a low frequency bandwidth signal, the low frequency bandwidth signal being an audio signal, said method comprising:
transforming the low frequency bandwidth signal into a quadrature mirror filter bank (QMF) domain to generate a first low frequency QMF spectrum; generating a low order harmonic patch by timestretching the low frequency bandwidth signal by transforming the low frequency bandwidth signal into a second low frequency QMF spectrum having finer frequency resolution than the first low frequency QMF spectrum; generating signals that are pitch shifted, by applying different shift coefficients to the low order harmonic patch, and generating a high frequency QMF spectrum from the signals; modifying the high frequency QMF spectrum to satisfy a high frequency energy condition; and generating the full bandwidth signal by combining the modified high frequency QMF spectrum with the first low frequency QMF spectrum.
 23. A bandwidth extension apparatus that produces a full bandwidth signal from a low frequency bandwidth signal, the low frequency bandwidth signal being an audio signal, said bandwidth extension apparatus comprising:
a first transform circuit configured to transform the low frequency bandwidth signal into a quadrature mirror filter bank (QMF) domain to generate a first low frequency QMF spectrum; a low order harmonic patch generation circuit configured to generate a low order harmonic patch by timestretching the low frequency bandwidth signal by transforming the low frequency bandwidth signal into a second low frequency QMF spectrum having finer frequency resolution than the first low frequency QMF spectrum; a high frequency generation circuit configured to (i) generate signals that are pitch shifted, by applying different shift coefficients to the low order harmonic patch, and (ii) generate a high frequency QMF spectrum from the signals; a spectrum modification circuit configured to modify the high frequency QMF spectrum to satisfy a high frequency energy condition; and a full bandwidth generation circuit configured to generate the full bandwidth signal by combining the modified high frequency QMF spectrum with the first low frequency QMF spectrum.
 24. A nontransitory computerreadable recording medium on which a program for producing a full bandwidth signal from a low frequency bandwidth signal is recorded, the low frequency bandwidth signal being an audio signal, the program causing a computer to execute:
transforming the low frequency bandwidth signal into a quadrature mirror filter bank (QMF) domain to generate a first low frequency QMF spectrum; generating a low order harmonic patch by timestretching the low frequency bandwidth signal by transforming the low frequency bandwidth signal into a second low frequency QMF spectrum having finer frequency resolution than the first low frequency QMF spectrum; generating signals that are pitch shifted, by applying different shift coefficients to the low order harmonic patch, and generating a high frequency QMF spectrum from the signals; modifying the high frequency QMF spectrum to satisfy a high frequency energy condition; and generating the full bandwidth signal by combining the modified high frequency QMF spectrum with the first low frequency QMF spectrum.
 25. An integrated circuit that produces a full bandwidth signal from a low frequency bandwidth signal, the low frequency bandwidth signal being an audio signal, said bandwidth extension apparatus comprising:
a first transform circuit configured to transform the low frequency bandwidth signal into a quadrature mirror filter bank (QMF) domain to generate a first low frequency QMF spectrum; a low order harmonic patch generation circuit configured to generate a low order harmonic patch by transforming the low frequency bandwidth signal into a second low frequency QMF spectrum having finer frequency resolution than the first low frequency QMF spectrum; a high frequency generation circuit configured to (i) generate signals that are pitch shifted, by applying different shift coefficients to the low order harmonic patch, and (ii) generate a high frequency QMF spectrum from the signals; a spectrum modification circuit configured to modify the high frequency QMF spectrum to satisfy a high frequency energy condition; and a full bandwidth generation circuit configured to generate the full bandwidth signal by combining the modified high frequency QMF spectrum with the first low frequency QMF spectrum.
 26. An audio decoding apparatus comprising:
a separation circuit configured to separate a coded low frequency bandwidth signal from coded information; a decoding circuit configured to decode the coded low frequency bandwidth signal; a transform circuit configured to transform the low frequency bandwidth signal generated through the decoding by said decoding circuit, into a quadrature mirror filter bank (QMF) domain to generate a first low frequency QMF spectrum; a low order harmonic patch generation circuit configured to generate a low order harmonic patch by timestretching the low frequency bandwidth signal by transforming the low frequency bandwidth signal into a second low frequency QMF spectrum having finer frequency resolution than the first low frequency QMF spectrum; a high frequency generation circuit configured to (i) generate signals that are pitch shifted, by applying different shift coefficients to the low order harmonic patch, and (ii) generate a high frequency QMF spectrum from the signals; a spectrum modification circuit configured to modify the high frequency QMF spectrum to satisfy a high frequency energy condition; and a full bandwidth generation circuit configured to generate the full bandwidth signal by combining the modified high frequency QMF spectrum with the first low frequency QMF spectrum.
1 Specification
The present invention relates to a bandwidth extension method for extending a frequency bandwidth of an audio signal.
Audio bandwidth extension (BWE) technology is typically used in modern audio codecs to efficiently code wideband audio signal at low bit rate. Its principle is to use a parametric representation of the original high frequency (HF) content to synthesize an approximation of the HF from the lower frequency (LF) data.
In the decoder, the LF part is firstly decoded (107). To approximate original HF part, the decoded LF part is transformed (108) to frequency domain, the resulting LF spectrum is modified (109) to generate a HF spectrum, under the guide of some decoded HF parameters. The HF spectrum is further refined (110) by postprocessing, also under the guide of some decoded HF parameters. The refined HF spectrum is converted (111) to time domain and combined with the delayed (112) LF part. As a result, the final reconstructed wideband audio signal is outputted.
Note that in the BWE technology, one important step is to generate a HF spectrum from the LF spectrum (109). There are a few ways to realize it, such as copying the LF portion to the HF location, nonlinear processing or upsampling.
A most well known audio codec that uses such a BWE technology is MPEG4 HEAAC, where the BWE technology is specified as SBR (spectral band replication) or SBR technology, where the HF part is generated by simply copying the LF portion within QMF representation to the HF spectral location.
Such a spectral copying operation, also called as patching, is simple and proved to be efficient for most cases. However, at very low bitrates (e.g. <20 kbits/s mono), where only small LF part bandwidths are feasible, such SBR technology can lead to undesired auditory artifact sensations such as roughness and unpleasant timbre (for example, see NonPatent Literature (NPL) 1).
Therefore, to avoid such artifacts resulting from mirroring or copying operation presented in low bitrate coding scenario, the standard SBR technology is enhanced and extended with the following main changes (for example, see NPL 2):
 (1) to modify the patching algorithm from copying pattern to a phase vocoder driven patching pattern
 (2) to increase adaptive time resolution for postprocessing parameters.
As a result of the first modification (aforementioned (1)), by spreading the LF spectrum with multiple integer factors, the harmonic continuity in the HF is ensured intrinsically. In particular, no unwanted roughness sensation due to beating effects can emerge at the border between low frequency and high frequency and between different high frequency parts (for example, see NPL 1).
And the second modification (aforementioned (2)) facilitates the refined HF spectrum to be more adaptive to the signal fluctuations in the replicated frequency bands.
As the new patching preserves harmonic relation, it is named as harmonic bandwidth extension (HBE). The advantages of the priorart HBE over standard SBR have also been experimentally confirmed for low bit rate audio coding (for example, see NPL 1).
Note that the above two modifications only affect the HF spectrum generator (109), the remaining processes in HBE are identical to those in SBR.
As shown in
Observing the above HF spectrum generator, it has a high computation amount. The computation amount mainly comes from time stretching operation, realized by a series of Short Time Fourier Transform (STFT) and Inverse Short Time Fourier Transform (ISTFT) transforms adopted in phase vocoders, and the succeeding QMF operation, applied on time stretched HF part.
A general introduction on phase vocoder and QMF transform is described as below.
A phase vocoder is a wellknown technique that uses frequencydomain transformations to implement timestretching effect. That is, to modify a signal'"'"'s temporal evolution while its local is spectral characteristics are kept unchanged. Its basic principle is described below.
Divide audio into overlap blocks and respace these blocks where the hop size (the timeinterval between successive blocks) is not the same at the input and at the output, as illustrated in
As shown in
Following the above principle, most classic phase vocoders adopt short time Fourier transform (STFT) as the frequency domain transform, and involve an explicit sequence of analysis, modification and resynthesis for time stretching.
The QMF banks transform time domain representations to joint timefrequency domain representations (and vice versa), which is typically used in parametricbased coding schemes, like the spectral band replication (SBR), parametric stereo coding (PS) and spatial audio coding (SAC), etc. A characteristic of these filter banks is that the complexvalued frequency (subband) domain signals are effectively oversampled by a factor of two. This enables postprocessing operations of the subband domain signals without introducing aliasing distortion.
In more detail, given a real valued discrete time signal x(n), with the analysis QMF bank, the complexvalued subband domain signals sk(n) are obtained through (Equation 2) below.
In (Equation 2), p(n) represents a lowpass prototype filter impulse response of order L1, α represents a phase parameter, M represents the number of bands and k the subband index with k=0, 1, . . . , M−1).
Note that like STFT, QMF transform is also a joint timefrequency transform. That means, it provides both frequency content of a signal and the change in frequency content over time, where the frequency content is represented by frequency subband and timeline is represented by time slot, respectively.
In detail, as illustrated in
[NPL 1] Frederik Nagel and Sascha Disch, ‘A harmonic bandwidth extension method for audio codecs’, IEEE Int. Conf. on Acoustics, Speech and Signal Proc., 2009
[NPL 2] Max Neuendorf, et al, ‘A novel scheme for low bitrate unified speech and audio coding—MPEG RMO’, in 126^{th }AES Convention, Munich, Germany, May 2009.
A problem associated with the priorart HBE technology is the high computation amount. The traditional phase vocoder that is adopted by HBE for stretching the signal has a higher computation amount because of applying successive FFTs and IFFTs, that is, successive FFTs (fast Fourier transforms) and IFFTs (inverse fast Fourier transforms); and the succeeding QMF transform increases the computation amount by being applied on the time stretched signal. Furthermore, in general, attempting to reduce the computation amount leads to the potential problem of quality degradation.
Thus, the present invention was conceived in view of the aforementioned problem and has as an object to provide a bandwidth extension method capable of reducing the computation amount in bandwidth extension as well as suppressing quality deterioration in the extended bandwidth.
In order to achieve the aforementioned object, the bandwidth extension method according to an aspect of the present invention is a bandwidth extension method for producing a full bandwidth signal from a low frequency bandwidth signal, the method including: transforming the low frequency bandwidth signal into a quadrature mirror filter bank (QMF) domain to generate a first low frequency QMF spectrum; generating pitchshifted signals by applying different shifting factors on the low frequency bandwidth signal; generating a high frequency QMF spectrum by timestretching the pitchshifted signals in a QMF domain; modifying the high frequency QMF spectrum to satisfy high frequency energy and tonality conditions; and generating the full bandwidth signal by combining the modified high frequency QMF spectrum with the first low frequency QMF spectrum.
Accordingly, the high frequency QMF spectrum is generated by timestretching the pitchshifted signals in the QMF domain. Therefore, it is possible to avoid the conventional complex processing (successively repeated FFTs and IFFTs, and subsequent QMF transform), for generating the high frequency QMF spectrum, and thus the computation amount can be reduced. Note that like STFT, the QMF transform itself provides joint timefrequency resolution, thus, QMF transform replaces the series of STFT and ISTFT. In addition, in the bandwidth extension method according to an aspect of the present invention, the pitchshifted signals are generated by applying mutually different shift coefficients instead of only one shift coefficient, and time stretching is performed on these signals, it is possible to suppress deterioration of quality of the high frequency QMF spectrum.
Furthermore, the generating of a high frequency QMF spectrum includes: transforming the pitch shifted signals into a QMF domain to generate QMF spectra; stretching the QMF spectra along a temporal dimension with different stretching factors to generate harmonic patches; timealigning the harmonic patches; and summing up the timealigned harmonic patches.
Furthermore, the stretching includes: calculating the amplitude and phase of a QMF spectrum among the QMF spectra; manipulating the phase to produce a new phase; and combining the amplitude with the new phase to generate a new set of QMF coefficients.
Furthermore, in the manipulating, the new phase is produced on the basis of an original phase of a whole set of QMF coefficients.
Furthermore, in the manipulating, manipulation is performed repeatedly for sets of QMF coefficients, and in the combining, new sets of QMF coefficients are generated.
Furthermore, in the manipulating, a different manipulation is performed depending on a QMF subband index.
Furthermore, in the combining, the new sets of QMF coefficients are overlapadded to generate the QMF coefficients corresponding to a temporallyextended audio signal.
Specifically, the time stretching in the bandwidth extension method according to an aspect of the present invention imitates the STFTbased stretching method by modifying phases of input QMF blocks and overlapadding the modified QMF blocks with different hop size. From the point of view of computation amount, comparing to the successive FFTs and IFFTs in STFTbased method, such time stretching has a lower computation amount by involving only one QMF analysis transform only. Therefore, it is possible to further reduce the computation amount in bandwidth extension.
Furthermore, in order to achieve the aforementioned object, the bandwidth extension method in another aspect of the present invention is a bandwidth extension method for producing a full bandwidth signal from a low frequency bandwidth signal, the method including: transforming the low frequency bandwidth signal into a quadrature mirror filter bank (QMF) domain to generate a first low frequency QMF spectrum; generating a low order harmonic patch by timestretching the low frequency bandwidth signal in a QMF domain; generating signals that are pitch shifted, by applying different shift coefficients to the low order harmonic patch, and generating a high frequency QMF spectrum from the signals; modifying the high frequency QMF spectrum to satisfy high frequency energy and tonality conditions; and generating the full bandwidth signal by combining the modified high frequency QMF spectrum with the first low frequency QMF spectrum.
Accordingly, the high frequency QMF spectrum is generated by timestretching and pitchshifting the low frequency bandwidth signal in the QMF domain. Therefore, it is possible to avoid the conventional complex processing (successively repeated FFTs and IFFTs, and subsequent QMF transform), for generating the high frequency QMF spectrum, and thus the computation amount can be reduced. In addition, since the pitchshifted signals are generated by applying mutually different shift coefficients instead of only one shift coefficient, and the high frequency QMF spectrum is generated from these signals, it is possible to suppress deterioration of quality of the high frequency QMF spectrum. Furthermore, since the high frequency QMF spectrum is generated from the low order harmonic patch, it is possible to further suppress deterioration of quality of the high frequency QMF spectrum.
It should be noted that, in the bandwidth extension method according to another aspect of the present invention, the pitch shifting also operates in QMF domain. This is in order to decompose the LF QMF subband on the low order patch into multiple subsubbands for higher frequency resolution, then mapping those subsubbands into high QMF subband to generate high order patch spectrum.
Furthermore, the generating of a low order harmonic patch includes: transforming the low frequency bandwidth signal into a second low frequency QMF spectrum; bandpassing the second low frequency QMF spectrum; and stretching the bandpassed second low frequency QMF spectrum along a temporal dimension.
Furthermore, the second low frequency QMF spectrum has finer frequency resolution than the first low frequency QMF spectrum.
Furthermore, the generating of signals includes: bandpassing the low order harmonic patch to generate bandpassed patches; mapping each of the bandpassed patches into high frequency to generate high order harmonic patches; and summing up the high order harmonic patches with the low order harmonic patch.
Furthermore, the bandpassing of the low order harmonic patch includes: splitting each QMF subband in each of the bandpassed patches into multiple subsubbands; mapping the subsubbands to high frequency QMF subbands; and combining results of the subsubband mapping.
Furthermore, the mapping of the subsubbands to high frequency subbands includes: dividing the subsubbands of each of the QMF subbands into a stop band part and a pass band part; computing transposed center frequencies of the subsubbands on the pass band part with patch order dependent factor; mapping the subsubbands on the pass band part into high frequency QMF subbands according to the center frequencies; and mapping the subsubbands on the stop band part into high frequency QMF subbands according to the subsubbands of the pass band part.
It should be noted that, in the bandwidth extension method according to the present invention, the process operations (steps) described above may be combined in any manner.
Such a bandwidth extension method as that according to the present invention is a low computation amount HBE technology which uses a computation amountreduced HF spectrum generator, which contributes the highest computation amount to HBE. To reduce the computation amount, in the bandwidth extension method according to an aspect of the present invention, a new QMFbased phase vocoder that performs time stretching in QMF domain with a low computation amount is used. Furthermore, in the bandwidth extension method according to another aspect of the present invention, to avoid the possible quality problems associated with the solution, a new pitch shifting algorithm is used that generates high order harmonic patches from low order patch in QMF domain.
It is the object of this invention to design a QMFbased patch where timestretching, or both timestretching and frequencyextending can be performed in QMF domain, to make it further, to develop a low computation amount HBE technology driven by a QMFbased phase vocoder.
It should be noted that the present invention can be realized, not only as such a bandwidth extension method, but also as a bandwidth extension apparatus and an integrated circuit that extend the frequency bandwidth of an audio signal using the bandwidth extension method, as a program for causing a computer to extend a frequency bandwidth using the bandwidth extension method, and as a recording medium on which the program is recorded.
The bandwidth extension method in the present invention designs a new harmonic bandwidth extension (HBE) technology. The core of the technology is to do time stretching or both time stretching and pitch shifting in QMF domain, rather than in traditional FFT domain and time domain, respectively. Comparing to the priorart HBE technology, the bandwidth extension method in the present invention can provide good sound quality and significantly reduce the computation amount.
The following embodiments are merely illustrative for the principles of various inventive steps. It is understood that variations of the details described herein will be apparent to others skilled in the art.
Hereinafter, a HBE scheme (harmonic bandwidth extension method) and a decoder (audio decoder or audio decoding apparatus) using the same, in the present invention, shall be described.
This bandwidth extension method is a bandwidth extension method for producing a full bandwidth signal from a low frequency bandwidth signal, the method including: transforming the low frequency bandwidth signal into a quadrature mirror filter bank (QMF) domain to generate a first low frequency QMF spectrum (hereafter referred to as the first transform step); generating pitchshifted signals by applying different shifting factors on the low frequency bandwidth signal (hereafter referred to as the pitch shift step); generating a high frequency QMF spectrum by timestretching the pitchshifted signals in a QMF domain (hereafter referred to as the high frequency generation step); modifying the high frequency QMF spectrum to satisfy high frequency energy and tonality conditions (hereafter referred to as the spectrum modification step); and generating the full bandwidth signal by combining the modified high frequency QMF spectrum with the first low frequency QMF spectrum (hereafter referred to as the full bandwidth generation step).
It should be noted that the first transform step (S11) is performed by a TF transform unit 1406 to be described later, the pitch shift step (S12) is performed by sampling units 504 to 506 and a time resampling unit 1403 to be described later. In addition, the high frequency generation step (S13) is performed by QMF transform units 507 to 509, phase vocoders 510 to 512, a QMF transform unit 404, and a timestretching unit 1405 to be described later. Furthermore, the full bandwidth generation step (S15) is performed by an addition unit 1410 to be described later.
Furthermore, the high frequency generation step includes: transforming the pitch shifted signals into a QMF domain to generate QMF spectra (hereafter referred to as the second transform step); stretching the QMF spectra along a temporal dimension with different stretching factors to generate harmonic patches (hereafter referred to as the harmonic patch generation step); timealigning the harmonic patches (hereafter referred to as the alignment step); and summing up the timealigned harmonic patches (hereafter referred to as the sumup step).
It should be noted that the second transform step is performed by the QMF transform units 507 to 509 and the QMF transform unit 1404, and the harmonic patch generation step is performed by the phase vocoders 510 to 512 and the timestretching unit 1405. Furthermore, the alignment step is performed by delay alignment units 513 to 515 to be described, and the sumup step is performed by an addition unit 516 to be described later.
In a HBE scheme in the present embodiment, a HF spectrum generator in HBE technology is designed with the pitch shifting processes in time domain, succeeded by the vocoder driven time stretching processes in QMF domain.
A given LF bandwidth input is firstly bandpassed (501˜503) and resampled (504˜506) to generate its HF bandwidth portions. Those HF bandwidth portions are transformed (507˜509) into QMF domain, the resulting QMF outputs are time stretched (510˜512) with stretching factors as two times of the according resampling factors. The stretched HF spectrums are delay aligned (513˜515) to compensate the potential different delay contributions from resampling process and summed up (516) to generate the final HF spectrum. It should be noted that each of the numerals 501 to 516 in parentheses above denote a constituent element of the HF spectrum generator.
Comparing the scheme in the present embodiment with the priorart scheme (
With the decoder, the bitstream is demultiplexed (1401) first, the signal LF part is then decoded (1402). To approximate original HF part, the decoded LF part (low frequency bandwidth signal) is resampled (1403) in time domain to generate HF part, the resulting HF part is transformed (1404) into QMF domain, the resulting HF QMF spectrum is stretched (1405) along the temporal direction, the stretched HF spectrum is further refined (1408) by postprocessing, under the guide of some decoded HF parameters. Meanwhile, the decoded LF part is also transformed (1406) into QMF domain. In the end, the refined HF spectrum combined (1410) with delayed (1407) LF spectrum to produce full bandwidth QMF spectrum. The resulting full bandwidth QMF spectrum is converted (1409) back to time domain to output the decoded wideband audio signal. It should be noted that each of the numerals 1401 to 1410 in parentheses above denotes a constituent element of the decoder.
The time stretching process of the HBE scheme in the present embodiment is, for an audio signal, its time stretched signal can be generated by QMF transform, phase manipulations and inverse QMF transform. Specifically, the harmonic patch generation step includes: calculating the amplitude and phase of a QMF spectrum among the QMF spectra (hereafter referred to as the calculation step); manipulating the phase to produce a new phase (hereafter referred to as the phase manipulation step); and combining the amplitude with the new phase to generate a new set of QMF coefficients (hereafter referred to as the QMF coefficient generation step). It should be noted that each of the calculating step, the phase manipulation step, and the QMF coefficient generation step is performed by a module 702 to be described later.
{tilde over (X)}(m,n)=r(m,n)·exp(j·ã(m,n)) (Equation 3)
Finally, the new set of QMF coefficients are transformed (703) into a new audio signal, corresponding to the original audio signal with modified time scale.
The QMFbased time stretching algorithm in the HBE scheme in the present embodiment imitates the STFTbased stretching algorithm: 1) the modification stage uses the instantaneous frequency concept to modify phases; 2) to reduce the computation amount, the overlapadding is performed in QMF domain using the additivity property of QMF transform.
Below is the detailed description of the time stretching algorithm in the HBE scheme in the present embodiment.
Assuming there are 2L realvalued time domain signal, x(n), to be stretched with a stretch factor s, after QMF analysis stage, there are 2L QMF complex coefficients, composed of 2L/M time slots and M subbands.
Note that like STFTbased stretching method, the transformed QMF coefficients are optionally, subject to analysis windowing before the phase manipulation. In this invention, this can be realized on either time domain or QMF domain.
On time domain, a time domain signal can be naturally windowed as in (Equation 4) below.
x(n)=x(n)·h(mod(n,L)) (Equation 4)
The mod(.) in (Equation 4) means modulation operation.
On the QMF domain, the equivalent operation can be realized by:
1) Transforming the analysis window h(n) (with length of L) into QMF domain to produce H(v,k) with L/M time slots and M subbands.
2) Simplifying the QMF representation of the window as shown in (Equation 5) below.
Here, v=0, . . . , L/M−1.
3) Perform the analysis windowing in QMF domain by X(m,k)=X(m,k)·H_{0}(w) where w=mod(m,L/M) (It should be noted that mod(.) means modulation operation).
Furthermore, in the HBE scheme in the present embodiment, in the phase manipulation step, the new phase is produced on the basis of an original phase of a whole set of QMF coefficients. Specifically, in the present embodiment, as a detailed realization of the time stretching, phase manipulation is performed on the basis of QMF block.
These original QMF coefficients can be treated as L+1 overlapped QMF blocks with hop size of 1 time slot and block length of L/M time slots, as illustrated in (a) in
To ensure no phasejumping effect, each original QMF block is modified to generate a new QMF block with modified phases, and phases of the new QMF blocks should be continuous at the point μ.s for the overlapping (μ)th and (μ+1)th new QMF block, which is equivalent to continuous at the joint points μ.M.s (μ∈N) in time domain.
Furthermore, in the HBE scheme in the present embodiment, in the phase manipulation step, manipulation is performed repeatedly for sets of QMF coefficients, and in the QMF coefficient generation step, new sets of QMF coefficients are generated. In this case, the phases are modified on the block basis following the below criteria.
Assuming the original phases are φ(k) for the given QMF coefficients X(u,k), for u=0, . . . , 2L/M−1 and k=0, . . . , M−1. Each original QMF block is sequentially modified to a new QMF block, as illustrated in (b) in
In the following, ψ_{u}^{(n)}(k) represents phase information of the nth new QMF block for n=1, . . . , L/M, u=0, . . . , L/M−1 and k=0, 1, . . . , M−1. These new phases, depending on whether the new block is respaced or not, are designed as follows.
Assuming the 1^{st }new QMF block X(^{1})(u,k) (u=0, . . . , L/M−1) is not respaced. So the new phase information ψ_{u}^{(1)}(k) is identical to φ_{u}(k). That is, ψ_{u}^{(1)}(k)=φ_{u}(k) for u=0, . . . , L/M−1 and k=0, 1, . . . , M−1.
For the 2^{nd }new QMF block X(^{2})(u,k) (u=0, . . . , L/M−1), it is respaced with hop size of s time slot (e.g. 2 time slots, as illustrated in
Furthermore, since the phases for theist time slot are changed, the remaining phases are adjusted accordingly to preserve the original instantaneous frequencies. That is, ψ_{u}^{(2)}(k)=ψ_{u−1}^{(2)}(k)+Δφ_{u+1}(k) for u=1, . . . , L/M−1, where Δφ_{u}(k)=φ_{u}(k)−φ_{u−1}(k) represents the original instantaneous frequencies for the original QMF block.
For the succeeding synthesis blocks, the same phase modification rules are applied. That is, for the mth new QMF block (m=3, . . . , L/M), its phases ψ_{u}^{(m)}(k) are decided as shown below.
_{0}^{(m)}(k)=ψ_{0}^{(m−1)}(k)+sΔφ_{m−1}(k)
ψ_{u}^{(m)}(k)=ψ_{u−1}^{(m)}(k)+Δφ_{m+u−1}(k) for u=1, . . . , L/M−1.
Incorporating with the original block amplitude information, the above new phases result in new L/M blocks.
Here, in the HBE scheme in the present embodiment, in the phase manipulation step, a different manipulation is performed depending on a QMF subband index. Specifically, the above phase modification method can be designed differently for QMF odd subbands and even subbands, respectively.
It is based on that for a tonal signal, its instantaneous frequency in QMF domain is associated with the phase difference, Δφ(n,k)=φ(n,k)−φ(n−1,k), in different ways.
In more detail, it is found that the instantaneous frequency ω(n,k) can be determined through (Equation 6) below.
In (Equation 6), the princ arg(a) means the principle angle of α, defined by (Equation 7) below.
princ arg(α)=mod(α+π,−2π)+π (Equation 7)
In the equation, mod(a,b) denotes the modulation of a over b.
As a result, for example, in the above phase modification method, the phase difference could be elaborated as in (Equation 8) below.
Furthermore, in the HBE scheme in the present embodiment, in the QMF coefficient generation step, the new sets of QMF coefficients are overlapadded to generate the QMF coefficients corresponding to a temporallyextended audio signal. Specifically, in order to reduce the computation amount, the QMF synthesis operation is not directly applied on each individual new QMF block. Instead, it applied on the overlapadded results of those new QMF blocks.
Note that like STFTbased stretching method, the new QMF coefficients are optionally, subject to synthesis windowing before the overlapadding. In the present embodiment, like the analysis windowing process, the synthesis windowing can be realized as shown below.
X^{(n+1)}(u,k)=X^{(n+1)}(u,k)·H_{0}(w), where w=mod(u, L/M)
Then, because of the additivity of QMF transform, all the new L/M blocks can be overlapadded, with the hop size of s time slots, prior to the QMF synthesis. The overlapadded results Y(u,k) can be obtained through the equation below.
Y(ns+u,k)=Y(ns+u,k)+X^{(n+1)}(u,k) (Equation 9)
Here, n=0, . . . , L/M−1, u=1, . . . , L/M, and k=0, . . . , M−1.
The final audio signal can be generated by applying the QMF synthesis on the Y(u,k), which corresponds to original signal with modified time scale.
Comparing the QMFbased stretching method in the HBE scheme in the present embodiment with the priorart STFTbased stretching method, it is worth noting that the inherent time resolution of QMF transform helps to significantly reduce the computation amount, which can only be obtained with a series of STFT transforms in priorart STFTbased stretching method.
The following computation amount analysis shows a rough computation amount comparison result by only considering the computation amount contributed from transforms.
Assuming the computation amount of STFT of size L is log_{2}(L)·L and the computation amount of a QMF analysis transform is about twice that of a FFT transform, the transform computation amount involved in the priorart HF spectrum generator is approximated as shown below.
L/R_{a}·2·L·log_{2}(L)·(T−1)+(2L)log_{2}(2L)≈2(L/R_{a}·(T−1)+1)·L·log_{2}(L) (Equation 10)
By comparison, the transform computation amount involved in the HF spectrum generator in the present embodiment is approximated as shown in (Equation 11) below.
For example, assuming L=1024 and Ra=128, the above computation amount comparison can be concreted in Table 1.
Hereinafter, a second embodiment of the HBE scheme (harmonic bandwidth extension method) and a decoder (audio decoder or audio decoding apparatus) using the same shall be described in detail.
Note that with adopting of the QMFbased time stretching method, the HBE technology used the QMFbased time stretching method has much lower computation amount. However, on the other hand, adopting the QMFbased time stretching method also brings two possible problems which have risks to degrade the sound quality.
Firstly, there is quality degradation problem for high order patch. Assume that a HF spectrum is composed with (T−1) patches with corresponding stretching factors as 2, 3, . . . , T. Because the QMFbased time stretching is block based, the reduced number of overlapadd operation in high order patch causes degradation in stretching effect.
Comparing to (a), it can be seen that although the center frequency is correctly shifted in (b), the resulting output also includes some other frequency components with nonignorable amplitude. This may result in the undesired noises presented in the stretched output.
Secondly, there is possible quality degradation problem for transient signals. Such a quality degradation problem may have 3 potential contribution sources.
The first contribution source is that the transient component may be lost during the resampling. Assuming a transient signal with a Dirac impulse located at an even sample, for a 4^{th }order patch with decimation with factor of 2, such a Dirac impulse disappears in the resampled signal. As a result, the resulting HF spectrum has incomplete transient components.
The second contribution source is the misaligned transient components among different patches. Because the patches have different resampling factor, a Dirac impulse located at a specified position may have several components located at the different time slots in the QMF domain.
The third contribution source is that the energies of transient components are spread unevenly among different patch. As shown in
To overcome the above quality degradation problem, an enhanced HBE technology is desired. However, too complicated solution also increases the computation amount. In the present embodiment, a QMFbased pitch shifting method is used to avoid the possible quality degradation problem and maintain the low computation amount advantage.
As described in detail below, in the HBE scheme (harmonic bandwidth extension method) in the present embodiment, HF spectrum generator in the HBE technology in the present embodiment is designed with both time stretching and pitch shifting process in QMF domain. Furthermore, a decoder (audio decoder or audio decoding apparatus) using the HBE in the present embodiment shall also be described below.
This bandwidth extension method is a bandwidth extension method for producing a full bandwidth signal from a low frequency bandwidth signal, the method including: transforming the low frequency bandwidth signal into a quadrature mirror filter bank (QMF) domain to generate a first low frequency QMF spectrum (hereafter referred to as the first transform step); generating a low order harmonic patch by timestretching the low frequency bandwidth signal in a QMF domain (hereafter referred to as the low order harmonic patch generation step); generating signals that are pitch shifted, by applying different shift coefficients to the low order harmonic patch, and generating a high frequency QMF spectrum from the signals (hereafter referred to as the high frequency generation step); modifying the high frequency QMF spectrum to satisfy high frequency energy and tonality conditions (hereafter referred to as the spectrum modification step); and generating the full bandwidth signal by combining the modified high frequency QMF spectrum with the first low frequency QMF spectrum (hereafter referred to as the full bandwidth generation step).
It should be noted that the first transform step is performed by a TF transform unit 1508 to be described later, the low order harmonic patch generation step is performed by a QMF transform 1503, a timestretching unit 1504, a QMF transform unit 601, and a phase vocoder 603 to be described later. In addition, the high frequency generation step is performed by a pitch shifting unit 1506, bandpass units 604 and 605, frequency extension units 606 and 607, and delay alignment units 608 to 610 to be described later. Furthermore, the spectrum modification step is performed by a HF postprocessing unit 1507 to be described later, and the full bandwidth generation step is performed by an addition unit 1512.
Furthermore, the low order harmonic patch generation step includes: transforming the low frequency bandwidth signal into a second low frequency QMF spectrum (hereafter referred to as the second transform step); bandpassing the second low frequency QMF spectrum (hereafter referred to as the bandpass step); and stretching the bandpassed second low frequency QMF spectrum along a temporal dimension (hereafter referred to as the stretching step).
It should be noted that the second transform step is performed by the QMF transform unit 601 and the QMF transform unit 1503, the bandpass step is performed by a bandpass unit 602 to be discussed later, and the stretching step is performed by the phase vocoder 603 and the timestretching unit 1504.
Furthermore, the second low frequency QMF spectrum has finer frequency resolution than the first low frequency QMF spectrum.
Furthermore, the high frequency generation step includes: bandpassing the low order harmonic patch to generate bandpassed patches (hereafter referred to as the patch generation step); mapping each of the bandpassed patches into high frequency to generate high order harmonic patches (hereafter referred to as the high order generation step); and summing up the high order harmonic patches with the low order harmonic patch (hereafter referred to as the sumup step).
It should be noted that the patch generation step is performed by the bandpass units 604 and 605, the high order generation step is performed by the frequency extension units 606 and 607, and the sumup step is performed by the an addition unit 611 to be discussed later.
A given LF bandwidth input is firstly transformed (601) into QMF domain, its bandpassed (602) QMF spectrum is time stretched (603) to double length. The stretched QMF spectrum is bandpassed (604˜605) to produce bandlimited (T−2) spectra. The resulting bandlimited spectra are translated (606˜607) into higher frequency bandwidth spectra. Those HF spectra are delay aligned (608˜610) to compensate the potential different delay contributions from spectrum translation process and summed up (611) to generate the final HF spectrum. It should be noted that each of the numerals 601 to 611 in parentheses above denotes a constituent element of the HF spectrum generator.
Note that comparing to the QMF transform (108 in
Comparing the HBE scheme in the present embodiment with the priorart scheme (
With the decoder, the bitstream is demultiplexed (1501) first, the signal LF part is then decoded (1502). To approximate original HF part, the decoded LF part (low frequency bandwidth signal) is transformed (1503) in QMF domain to generate LF QMF spectrum. The resulting LF QMF spectrum is stretched (1504) along the temporal direction to generate a low order HF patch. The low order HF patch is pitch shifted (1506) to generate high order patches. The resulting high order patches are combined with delayed (1505) low order HF patch to generate HF spectrum, the HF spectrum is further refined (1507) by postprocessing, under the guide of some decoded HF parameters. Meanwhile, the decoded LF part is also transformed (1508) into QMF domain. In the end, the refined HF spectrum combined with delayed (1509) LF spectrum to produce (1512) full bandwidth QMF spectrum. The resulting full bandwidth QMF spectrum is converted (1510) back to time domain to output the decoded wideband audio signal. It should be noted that each of the numerals 1501 to 1512 denotes a constituent element of the decoder.
A QMFbased pitch shifting algorithm (frequency extending method in QMF domain) for the pitchshifting unit 1506 in the HBE scheme in the present embodiment is designed by decomposing the LF QMF subbands into plural subsubbands, transposing those subsubbands into HF subbands, and combining the resulting HF subbands to generate a HF spectrum. Specifically, the high order generation step includes: splitting each QMF subband in each of the bandpassed patches into multiple subsubbands (hereafter referred to as the splitting step); mapping the subsubbands to high frequency QMF subbands (hereafter referred to as the mapping step); and combining results of the subsubband mapping (hereafter referred to as the combining step).
It should be noted that the splitting step corresponds to step 1 (901˜903) to be described later, the mapping step corresponds to steps 2 and 3 (904˜909) to be described later, and the combining step corresponds to step 4 (910) to be described later.
For step 1, a few methods are available to decompose a QMF subband into multiple subsubbands in order to obtain better frequency resolution. For example, the socalled Mth band filters that are adopted in MPEG surround codec. In this preferred embodiment of the invention, the subband decomposition is realized by applying an additional set of exponentially modulated filter bank, defined by (Equation 12) below.
Here, q=−Q, −Q+1, . . . , 0, 1, . . . , Q−1 and n=0, 1, . . . , N (where n_{0 }is an integer constant, N is the order of filter bank).
By adopting the above filter bank, a given subband signal, say, the kth subband signal x(n,k), is decomposed into 2Q subsubband signals according to (Equation 13) below.
y_{q}^{k}(n)=conv(x(n,k), g_{q}(n)) (Equation 13)
Here, q=−Q, −Q+1, . . . , 0, 1, . . . , Q−1. In the equation, ‘conv(.)’ denotes the convolution function.
With such an additional complex transform, the frequency spectrum of one subband is further split into 2Q subfrequency spectrum. From the frequency resolution point of view, if the QMF transform has Mband, its associated subband frequency resolution is η/M and its subsubband frequency resolution is refined to η/(2Q·M).
In addition, the overall system shown in (Equation 14) is timeinvariant, that is, free of aliasing, in spite of the use of downsampling and upsampling.
Note that the above additional filter bank is oddly stacked (the factor q+0.5), which means there is no subsubbands centered around the DC value. Rather, for an even Q number, the center frequencies of the subsubbands are symmetric around zero.
For step 2, the center frequencies scaling can be simplified by considering the oversampling characteristics of the complex QMF transform.
Note that in the complex QMF domain, as the pass bands of adjacent subbands overlap each other, a frequency component in the overlap zone would appear in both subbands (See International Patent Application Publication No. WO 2006048814).
As a result, the frequency scaling can be simplified to half computation amount by only calculating frequencies for those subsubbands residing on the pass band, that is, the positive frequency part for an even subband or negative frequency part for an odd subband.
In more detail, the k_{LF}th subband is split into 2Q subsubbands. In other words, x(n,k_{LF}) is divided as shown in (Equation 15) below.
y_{q}^{k}^{LF}(n)_{) } (Equation 15)
Subsequently, in order to produce the tth order patch, the center frequencies of those subsubbands are scaled using (Equation 16) below.
Here, q=−Q, −Q+1, . . . , −1 when k_{LF }is odd, or q=0, 1, . . . , Q−1 when k_{LF }is even.
For step 3, mapping the subsubbands into HF subband also needs to take into account the characteristics of complex QMF transform. In the present embodiment, such a mapping process is carried out in two steps, first is to straightforwardly map all subsubbands on the pass band into HF subband; second, based on the above mapping result, to map all subsubbands on the stop band into HF subband. Specifically, the mapping step includes: dividing the subsubbands of each of the QMF subbands into a stop band part and a pass band part (hereafter referred to as the division step); computing transposed center frequencies of the subsubbands on the pass band part with patch order dependent factor (hereafter referred to as the frequency computation step); mapping the subsubbands on the pass band part into high frequency QMF subbands according to the center frequencies (hereafter referred to as the first mapping step); and mapping the subsubbands on the stop band part into high frequency QMF subbands according to the subsubbands of the pass band part (hereafter referred to as the second mapping step).
To understand the above point, it is advantageous to review what relationship exists for a pair positive frequency and negative frequency for the same signal component and their associated subband indices.
As aforementioned, in the complex QMF domain, a sinusoid spectrum has both a positive and negative frequency. Specifically, the sinusoidal spectrum has one out of those frequencies in the pass band of one QMF subband and the other of the frequencies in the stop band of an adjacent subband. Considering the QMF transform is an oddlystacked transform, such a pair of signal components can be illustrated in
Here, the grey area denotes the stop band of a subband. For an arbitrary sinusoid signal (in solid line) on the pass band of a subband, its aliasing part (in dashed line) is located in the stop band of the adjacent subband (the paired two frequency components are associated by a line with double arrows).
A sinusoid signal with frequency f_{0 }as shown in (Equation 17) below.
The pass band component of the sinusoidal signal with the abovedescribed frequency f_{0 }resides on the kth subband if (Equation 18) below is satisfied.
In addition, its stop band component resides on the k^{˜}th subband if (Equation 19) below is satisfied.
If a subband is decomposed into 2Q subsubbands, the above relation is elaborated with higher frequency resolution as shown in
(Equation 20)
Therefore, in the present embodiment, in order to map the subsubbands on the stop band into HF subband, it is necessary to associate them with the mapping results for those subsubbands on the pass band. The motivation of such operation is to make sure that the frequency pairs for LF components are still in pair when they are upwardly shifted into HF components.
For this purpose, firstly, it is straight forward to map the subsubbands on pass band into HF subband. By considering the center frequencies of frequency scaled subsubbands and the frequency resolution of QMF transform, the mapping function can be described by m(k,q) as shown in (Equation 21) below.
Here, q=−Q, −Q+1, . . . , −1 if k_{LF }is odd, or q=0, 1, . . . , Q−1 if k_{LF }is even. Here, the coefficient shown in (Equation 22) below denotes a rounding operation to obtain the nearest integers of x towards minus infinity.
└x┘ (Equation 22)
In addition, due to the upward scaling (t/2>1), it is possible that one HF subband has a plural subsubbands mapping sources. That is, it is possible that m(k,q_{1})=m(k,q_{2}) or m(k_{1},q_{1})=m(k_{2},q2). Therefore, a HF subband could be a combination of multiple subsubbands of LF subbands, as shown in (Equation 23).
Here, q=−Q, −Q+1, . . . , −1 if k_{LF }is odd, or q=0, 1, . . . , Q−1 if k_{LF }is even.
Secondly, following the aforementioned relationship between frequency pairs and subband indices, the mapping function for those subsubbands on stop band can be established as the following.
Considering a LF subband k_{LF}, the mapping functions of the subsubbands on its pass band are already decided by the 1^{st }step as: m(k_{LF},−Q), m(k_{LF},−Q+1), . . . , m(k_{LF},−1) for the odd k_{LF }and m(k_{LF},0), m(k_{LF},1), . . . , m(k_{LF},Q−1) for the even k_{LF}, then the pass band associated stop band part can be mapped according to (Equation 24) below.
Here, ‘condition a’ refers to when k_{LF }is even and (Equation 25) below is even, or when k_{LF }is odd and (Equation 26) below is even.
In addition, as described above, (Equation 27) below denotes a rounding operation to obtain the nearest integers of x towards minus infinity.
└x┘ (Equation 27)
The resulting HF subband is the combination of all associated LF subsubbands, as shown in (Equation 28) below.
Here, q=−Q, −Q+1, . . . , −1 if k_{LF }is even, or q=0, 1, . . . , Q−1 if k_{LF }is odd.
In the end, all mapping results on the pass band and stop band are combined to form the HF subband, as shown in (Equation 29) below.
x(n,k_{HF})=x_{pass}(n,k_{HF})+x_{stop}(n,k_{HF}) (Equation 29)
Note that the above pitch shifting method in QMF domain benefits both high frequency quality degradation and possible transient handling problem.
Firstly, all patches now have the same stretching factor, the smallest one, which greatly reduces the high frequency noises (coming from those incorrect signal components generated during time stretching). Secondly, all contribution sources for transient degradation are avoided. That is, there is no time domain resampling process; the same stretching factors are used for all patches, which inherently eliminated the possibility of misalignment.
In addition, it should be noted that the present embodiment has some downside at the frequency resolution. Note that due to adopting subsubband filtering, the frequency resolution is increased from η/M to η/(2Q·M), but it is still coarser than the fine frequency resolution of time domain resampling (η/L). Nevertheless, considering the human ear has less sensitivity to high frequency signal component, the pitch shifted result produced by the present embodiment is proved to be perceptually no different with that produced by the resampling method.
Apart from the above, comparing to the HBE scheme in the first embodiment, the HBE scheme in the present embodiment also provides a bonus with further reduced computation amount, because only one low order patch needs time stretching operation.
Again, such a computation amount reduction can be roughly analyzed by only considering the computation amount contributed from transforms.
Following the assumptions in aforementioned computation amount analysis, the transform computation amount involved in the HF spectrum generator in the present embodiment is approximated as shown below.
2·(2L/2)·log_{2}(2L/2)=2·L·log_{2}(L) (Equation 30)
Therefore, Table 1 can be updated as the following.
The present invention is a new HBE technology for low bit rate audio coding. Using this technology, a wideband signal can be reconstructed based on a low frequency bandwidth signal by generating its high frequency (HF) part via time stretching and frequency extending the low frequency (LF) part in QMF domain. Comparing to the prior art HBE technology, the present invention provides comparable sound quality and much lower computation count. Such a technology can be deployed in such applications as mobile phone, teleconferencing, etc, where audio codec operates at a low bit rate with low computation amount.
It should be noted that each of the function blocks in the block diagrams (
Although an LSI is referred to here, there are instances where the designations IC, system LSI, super LSI, ultraLSI are used due to the difference in the degree of integration.
In addition, the means for circuit integration is not limited to an LSI, and implementation with a dedicated circuit or a generalpurpose processor is also available. It is also acceptable to use a Field Programmable Gate Array (FPGA) that allows programming after the LSI has been manufactured, and a reconfigurable processor in which connections and settings of circuit cells within the LSI are reconfigurable.
Furthermore, if integrated circuit technology that replaces LSI appears through progress in semiconductor technology or other derived technology, that technology can naturally be used to carry out integration of the function blocks.
Furthermore, among the respective function blocks, the unit which stores data to be coded or decoded may be made into a separate structure without being included in the single chip.
The present invention relates to a new harmonic bandwidth extension (HBE) technology for low bit rate audio coding. With the technology, a wideband signal can be reconstructed based on a low frequency bandwidth signal by generating its high frequency (HF) part via time stretching and frequencyextending the low frequency (LF) part in QMF domain. Comparing to the prior art HBE technology, the present invention provides comparable sound quality and much lower computation amount. Such a technology can be deployed in such applications as mobile phones, teleconferencing, etc, where audio codec operates at a low bit rate with low computation amount.
501503, 602, 604, 605 Bandpass unit
504506 Sampling unit
507509, 601, 1404, 1505 QMF transform unit
510512, 603 Phase vocoder
513515, 608610, 1407, 1505, 1509 Delay alignment unit
516, 611, 1410, 1511, 1512 Addition unit
606, 607 Frequency extension unit
1401, 1501 Demultiplex unit
1402, 1502 Decoding unit
1403 Time resampling unit
1405, 1504 Timestretching unit
1406, 1508 TF transform unit
1409, 1510 Inverse TF transform unit
1506 Pitchshifting unit