Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
First Claim
1. A method of encoding an input signal, comprising:
- using a hierarchical filterbank (HFB) to decompose an input signal into a multi-resolution time/frequency representation;
extracting tonal components at multiple frequency resolutions from the time/frequency representation;
extracting residual components from the time/frequency representation;
ranking the components based on their relative contribution to decoded signal quality;
quantizing and encoding the components; and
eliminating a sufficient number of the lowest ranked encoded components to form a scaled bit stream having a data rate less than or approximately equal to a desired data rate.
9 Assignments
0 Petitions
Accused Products
Abstract
A method for compressing audio input signals to form a master bit stream that can be scaled to form a scaled bit stream having an arbitrarily prescribed data rate. A hierarchical filterbank decomposes the input signal into a multi-resolution time/frequency representation from which the encoder can efficiently extract both tonal and residual components. The components are ranked and then quantized with reference to the same masking function or different psychoacoustic criteria. The selected tonal components are suitably encoded using differential coding extended to multichannel audio. The time-sample and scale factor components that make up the residual components are encoded using joint channel coding (JCC) extended to multichannel audio. A decoder uses an inverse hierarchical filterbank to reconstruct the audio signals from the tonal and residual components in the scaled bit stream.
68 Citations
45 Claims
-
1. A method of encoding an input signal, comprising:
-
using a hierarchical filterbank (HFB) to decompose an input signal into a multi-resolution time/frequency representation; extracting tonal components at multiple frequency resolutions from the time/frequency representation; extracting residual components from the time/frequency representation; ranking the components based on their relative contribution to decoded signal quality; quantizing and encoding the components; and eliminating a sufficient number of the lowest ranked encoded components to form a scaled bit stream having a data rate less than or approximately equal to a desired data rate. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method of encoding an audio input signal, comprising:
-
decomposing an audio input signal into a multi-resolution time/frequency representation; extracting tonal components at each frequency resolution; removing the tonal components from the time/frequency representation to form a residual signal; extracting residual components from the residual signal; grouping the tonal components into at least one frequency sub-domain; grouping the residual components into at least one residual sub-domain; ranking the sub-domains based on psychoacoustic importance; ranking the components within each sub-domain based on psychoacoustic importance; quantizing and encoding the components within each sub-domain; and eliminating a sufficient number of the low ranking components from the lowest ranked sub-domains to form a scaled bit stream having a data rate less than or approximately equal to a desired data rate. - View Dependent Claims (23, 24)
-
-
25. A scalable bit stream encoder for encoding an input audio signal and forming a scalable bit stream, comprising:
-
a hierarchical filterbank (HFB) that decomposes the input audio signal into transform coefficients at successively lower frequency resolution levels and back into time-domain sub-band samples at successively finer time scales at successive iterations; a tone encoder that (a) extracts tonal components from the transform coefficients at each iteration, quantizes and stores them in a tone list, (b) removes the tonal components from the input audio signal to pass a residual signal to the next iteration of the HFB and (c) ranks all of the extracted tonal components based on their relative contribution to decoded signal quality; a residual encoder that applies a final inverse transform with relatively lower frequency resolution than the final iteration of the HFB to the final residual signal to extract the residual components and ranks the residual components based on their relative contribution to decoded signal quality; a bit stream formatter that assembles the tonal and residual components on a frame-by-frame bases to form a master bit stream; and a scaler that eliminates a sufficient number of the lowest ranked encoded components from each frame of the master bit stream to form a scaled bit stream having a data rate less than or approximately equal to a desired data rate. - View Dependent Claims (26, 27, 28, 29, 30)
-
-
31. A method of reconstructing a time-domain output signal from an encoded bit stream, comprising:
-
receiving a scaled bit stream having a predetermined data rate within a given range as a sequence of frames, each frame containing at least one of the following (a) a plurality of quantized tonal components representing frequency domain content at different frequency resolutions of the input signal, b) quantized residual time-sample components representing the time-domain residual formed from the difference between the reconstructed tonal components and the input signal, and c) scale factor grids representing signal energies of the residual signal, which at least partially span a frequency range of the input signal; receiving information for each frame about the position of the quantized components and/or grids within the frequency range; parsing the frames of the scaled bit stream into the components and grids; decoding any tonal components to form transform coefficients; decoding any time-sample components and any grids; multiplying the time-sample components by grid elements to form time-domain samples; and applying an inverse hierarchical filterbank to the transform coefficients and time-domain samples to reconstruct a time-domain output signal. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38)
-
-
39. A decoder for reconstructing a time-domain output audio signal from an encoded bit stream, comprising:
-
a bit stream parser for parsing each frame of a scaled bit stream into its audio components, each frame containing at least one of the following (a) a plurality of quantized tonal components representing frequency domain content at different frequency resolutions of the input signal, b) quantized residual time-sample components representing the time-domain residual formed from the difference between the reconstructed tonal components and the input signal, and c) scale factor grids representing the signal energies of the residual signal; a residual decoder for decoding any time-sample components and any grids to reconstruct time samples; a tonal decoder for decoding any tonal components to form transform coefficients; and an inverse hierarchical filterbank that reconstructs the output signal by transforming the time samples into residual transform coefficients, combining them with the transform coefficients for a set of the tonal components at a low frequency resolution and inverse transforming the combined transform coefficients to form a partially reconstructed output signal, and repeating the steps on this partially reconstructed output signal with the transform coefficients for another set of tonal components at the next highest frequency resolution until the output audio signal is reconstructed. - View Dependent Claims (40)
-
-
41. A method of hierarchically filtering an input signal to achieve a nearly arbitrary time/frequency decomposition, comprising the steps of:
-
(a) buffering samples of the input signal into frames of N samples; (b) multiplying the N samples in each frame by an N-sample window function; (c) applying an N-point transform to produce N/2 transform coefficients; (d) dividing the N/2 residual transform coefficients into P groups of Mi coefficients, such that the sum of the Mi coefficients is (e) for each of P groups, applying a (2*Mi)-point inverse transform to the transform coefficients to produce (2*Mi) sub-band samples from each group; (f) in each sub-band i, multiplying the (2*Mi) sub-band samples by a (2*Mi)-point window function; (g) in each sub-band i, overlapping with Mi previous samples and adding corresponding values to produce Mi new samples for each sub-band; and (h) repeating steps (a)-(g) on one or more of the sub-bands of Mi new samples using successively smaller transform sizes N until the desired time/transform resolution is achieved. - View Dependent Claims (42, 43, 44)
-
-
45. A method of hierarchically reconstructing time samples of an input signal, in which each input frame contains Mi time samples in each of P sub-bands, comprising performing the following steps:
-
a) in each sub-band i, buffering and concatenating the Mi previous samples with the current Mi samples to produce 2*Mi new samples; b) in each sub-band i, multiplying the 2*Mi sub-band samples by a 2*Mi point window function; c) applying a (2*Mi)-point transform to the windowed sub-band samples to produce Mi transform coefficients for each sub-band i; d) concatenating the Mi transform coefficients for each sub-band i to form a single group of N/2 coefficients; e) applying an N-point inverse transform to the concatenated coefficients to produce a frame of N samples; f) multiplying each frame of N samples by an N-sample window function to produce N windowed samples; g) overlap adding the resulting windowed samples to produce N/2 new output samples at the given sub-band level; and h) repeating steps (a) through (g) until all sub-bands have been processed and the N original time samples are reconstructed.
-
Specification