LOW-COMPLEXITY, LOW-DELAY, SCALABLE AND EMBEDDED SPEECH AND AUDIO CODING WITH ADAPTIVE FRAME LOSS CONCEALMENT
First Claim
1. A system for processing audio signals comprising;
- (a) a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals;
(b) a transform processor for performing transform computation of a signal in at least one signal frame, said transform processor generating a transform signal having one or more (NB) bands;
(c) a quantizer providing quantized values associated with the transform signal in said NB bands;
(d) an output processor for forming an output bit stream corresponding to an encoded version of the input signal; and
(e) a decoder capable of reconstructing from the output bit stream at least two replicas of the input signal, each replica having a different sampling rate, without using downsampling.
2 Assignments
0 Petitions
Accused Products
Abstract
High-quality, low-complexity and low-delay scalable and embedded system and method are disclosed for coding speech and general audio signals. The invention is particularly suitable in Internet Protocol (IP)-based multimedia communications. Adaptive transform coding, such as a Modified Discrete Cosine Transform, is used, with multiple small-size transforms in a given signal frame to reduce the coding delay and computational complexity. In a preferred embodiment, for a chosen sampling rate of the input signal, one or more output sampling rates may be decoded with varying degrees of complexity. Multiple sampling rates and bit rates are supported due to the scalable and embedded coding approach underlying the present invention. Further, a novel adaptive frame loss concealment approach is used to reduce the distortion caused by packet loss in communications using IP networks.
131 Citations
43 Claims
-
1. A system for processing audio signals comprising;
-
(a) a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals;
(b) a transform processor for performing transform computation of a signal in at least one signal frame, said transform processor generating a transform signal having one or more (NB) bands;
(c) a quantizer providing quantized values associated with the transform signal in said NB bands;
(d) an output processor for forming an output bit stream corresponding to an encoded version of the input signal; and
(e) a decoder capable of reconstructing from the output bit stream at least two replicas of the input signal, each replica having a different sampling rate, without using downsampling. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for processing audio signals, comprising:
-
dividing an input audio signal into frames corresponding to successive time intervals;
for each frame performing at least two relatively short-size transform computations;
extracting one set of side information about the frame from said at least two relatively short-size transform computations;
encoding information about the frame, said encoded information comprising the side information and transform coefficients from said at least two transform computations; and
reconstructing the audio signal based on the encoded information. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method for adaptive frame loss concealment in processing of audio signals divided into frames corresponding to successive time intervals, where for each input frame one or more transform domain computations are performed over partially overlapping windows covering the audio signal, and output synthesis is performed using an overlap-and-add method, the method comprising:
-
in a sequence of received frames identifying a frame as missing;
analyzing the immediately preceding frame to determine an optimum time lag for waveform signal extrapolation;
based on the determined optimum time lag performing waveform signal extrapolation to synthesize a first portion of the missing frame, said synthesis using information already available as part of the preceding frame to minimize discontinuities at the frame boundary; and
performing waveform signal extrapolation in the remaining portion of the missing frame. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33)
-
-
34. A method for scalable processing of audio signals sampled at a first sampling rate and divided into frames corresponding to successive time intervals, where for each input frame one or more relatively short-size transform domain computations are performed over windows covering portions of the audio signal, comprising:
-
receiving transform domain coefficients corresponding to said one or more transform domain computations; and
directly reconstructing the audio signal at a second sampling rate lower than the first sampling rate using an inverse transform operating only on a portion of the received transform domain coefficients, without downsampling. - View Dependent Claims (35, 36, 37)
-
-
38. A coding method for use in processing of audio signals divided into frames corresponding to successive time intervals, where for each input frame at least one transform domain computation is performed, and the transform coefficients are divided into NB bands, the method comprising:
-
computing a base-2 logarithm of the average power of the transform coefficients in the NB bands to obtain a log-gain array LG(i), i=0 , . . . , NB−
1;
encoding information about each frame based on the log-gain array LG(i), said encoded information comprising the transform coefficients, where the encoding step comprises;
computing a quantized log-gain array LGQ(i), i=0, . . . ,NB−
1; and
converting the quantized log-gain coefficients of the array LGQ(i) into a linear-gain domain using the following steps;
(1) providing a table containing all possible values of the linear gain g(0) corresponding to the number of bits allocated to LGQ(0);
(2) finding the value of g(0) using table lookup;
(3) from the second band onward, applying the formula;
to compute recursively all linear gains using a single multiplication per linear gain, where each of the quantities 2DLGQ(i)/2 are found using table lookup; and
decoding said encoded information about each frame to reconstruct the input audio signal. - View Dependent Claims (39)
-
-
40. An embedded coding method for use in processing of an audio signal divided into frames corresponding to successive time intervals, where for each input frame at least one transform domain computation is performed and the resulting transform coefficients are divided into NB bands, each band having at least one transform coefficient, the method comprising:
-
for a pre-specified first bit rate providing a first output bit stream which comprises information about transform coefficients in M1≦
NB bands and information about the average power in the M1 bands, and wherein bit allocation is determined based on a target signal-to-noise ratio (TSNR) in the NB bands, said first output bit stream being sufficient to reconstruct a representation of the audio signal;
for at least a second pre-specified bit rate higher than the first bit rate, providing an output bit stream embedding said first output bit stream and further comprising information about transform coefficients in M2 bands, where M1≦
M2≦
NB, and information about the average power in the M2 bands, and wherein bit allocation is determined based on the difference between the TSNR in the NB bands and a value determined by the number of bits allocated to each band at the next-lower bit rate; and
reconstructing a representation of the input signal using an embedded bit stream corresponding to the desired bit rate. - View Dependent Claims (41)
-
-
42. A system for embedded coding of audio signals comprising:
-
(a) a frame extractor for dividing an input signal into a plurality of signal frames corresponding to successive time intervals;
(b) means for providing transform-domain representations of the signal in each frame;
(c) means for providing a first encoded data stream corresponding to a user-specified transform-domain representation, which first encoded data stream contains information sufficient to reconstruct a representation of the input signal;
(d) means for providing one or more secondary encoded data streams comprising additional information in the transform-domain representation of the signal; and
(e) means for providing an embedded output signal based at least on said first encoded data portion and said one or more secondary encoded data portions of the user-selected transform representation.
-
-
43. A method for processing audio signals, comprising:
-
dividing an input audio signal into frames corresponding to successive time intervals;
for each frame performing at least two relatively short-size transform computations to obtain a two-dimensional output transform coefficient array T(k,m) defined as;
T(k,m),k=0, 1, 2, . . . , M−
1, and m=0, 1, . . . , NTPF−
1,where M is the number of transform coefficients in each transform, and NTPF is the number of transforms per frame;
extracting one set of side information about the frame from said at least two relatively short-size transform computations;
encoding information about the frame, said encoded information comprising the side information and transform coefficients T(k,m) from said at least two transform computations wherein said transform coefficients being divided into NB frequency bands, and further wherein bit allocation is done by;
(a) constructing an approximation of the signal spectrum envelope using the log-gains of the coefficients in the NB bands;
(b) estimating a noise masking threshold function on the basis of the constructed approximation;
(c) mapping the signal-to-masking threshold ratio to target signal-to-noise (TSNR) values; and
(d) performing bit allocation based on the mapping in (c); and
reconstructing the audio signal based on the encoded information.
-
Specification