Tonal analysis for perceptual audio coding using a compressed spectral representation
First Claim
1. A method for performing perceptual audio encoding on an input audio signal, the method comprising:
- (a) sampling the input audio signal to generate multiple sampled frames;
(b) performing a first frequency transformation of each sampled frame into a frequency do main representation of the sample frame;
(c) applying a magnitude compression operation to the frequency domain representation of each sampled frame to form a magnitude-compressed representation of the sample frame;
(d) performing a second frequency transformation of the magnitude-compressed representation of each sampled frame to form a compressed spectral representation of the sample frame;
(e) determining tonality of each sampled frame from a peak magnitude and an average magnitude of the compressed spectral representation of the sampled frame to distinguish tone-like components in the input audio signal from noise-like components in the input audio signal;
(f) selecting a masked threshold for each sampled frame corresponding to the determined tonality of the sampled frame, wherein masked thresholds selected for the tone-like components in the input audio signal are different from masked thresholds selected for the noise-like components in the input audio signal; and
(g) performing perceptual audio encoding on the sampled frames based on the selected masked thresholds to compress the tone-like features in the input audio signal at a different level of compression from the noise-like features in the input audio signal.
10 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides an apparatus, method and tangible medium storing instructions for determining tonality of an input audio signal, for selection of corresponding masked thresholds for use in perceptual audio coding. In the various embodiments, the input audio signal is sampled and transformed using a compressed spectral operation to form a compressed spectral representation, such as a cepstral representation. A peak magnitude and an average magnitude of the compressed spectral representation are determined. Depending upon the ratio of peak-to-average magnitudes, a masked threshold is selected having a corresponding degree of tonality, and is used to determine a plurality of quantization levels and a plurality of bit allocations to perceptually encode the input audio signal with a distortion spectrum beneath a level of just noticeable distortion (JND). The invention also includes other methods and variations for selecting substantially tone-like or substantially noise-like masked thresholds for perceptual encoding of the input audio signal.
-
Citations
43 Claims
-
1. A method for performing perceptual audio encoding on an input audio signal, the method comprising:
-
(a) sampling the input audio signal to generate multiple sampled frames; (b) performing a first frequency transformation of each sampled frame into a frequency do main representation of the sample frame; (c) applying a magnitude compression operation to the frequency domain representation of each sampled frame to form a magnitude-compressed representation of the sample frame; (d) performing a second frequency transformation of the magnitude-compressed representation of each sampled frame to form a compressed spectral representation of the sample frame; (e) determining tonality of each sampled frame from a peak magnitude and an average magnitude of the compressed spectral representation of the sampled frame to distinguish tone-like components in the input audio signal from noise-like components in the input audio signal; (f) selecting a masked threshold for each sampled frame corresponding to the determined tonality of the sampled frame, wherein masked thresholds selected for the tone-like components in the input audio signal are different from masked thresholds selected for the noise-like components in the input audio signal; and (g) performing perceptual audio encoding on the sampled frames based on the selected masked thresholds to compress the tone-like features in the input audio signal at a different level of compression from the noise-like features in the input audio signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. An apparatus for performing perceptual audio encoding on an input audio signal, the apparatus comprising:
-
a sampler adapted to sample the input audio signal to generate multiple sampled frames; a psychoacoustic analyzer adapted to (1) perform a first frequency transformation of each sampled frame into a frequency domain representation of the sampled frame, (2) apply a magnitude compression operation to the frequency domain representation of each sampled frame to form a magnitude-compressed representation of the sampled frame, (3) perform a second frequency transformation of the magnitude-compressed representation of each sampled frame to form a compressed spectral representation of the sampled frame, (4) determine tonality of each sampled frame from a peak magnitude and an average magnitude of the compressed spectral representation of the sampled frame to distinguish tone-like components in the input audio signal from noise-like components in the input audio signal, and (5) select a masked threshold for each sampled frame corresponding to the determined tonality of the sampled frame, wherein masked thresholds selected for the tone-like components in the input audio signal are different from masked thresholds selected for the noise-like components in the input audio signal; and an encoder adapted to perform perceptual audio encoding on the sampled frames based on the selected masked thresholds to compress the tone-like features in the input audio signal at a different level of compression from the noise-like features in the input audio signal. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
-
43. Apparatus for performing perceptual audio encoding on an input audio signal, the apparatus comprising:
-
means for sampling the input audio signal to generate multiple sampled frames; means for performing a first frequency transformation of each sampled frame into a frequency domain representation of the sample frame; means for applying a magnitude compression operation to the frequency domain representation of each sampled frame to form a magnitude-compressed representation of the sample frame; means for performing a second frequency transformation of the magnitude-compressed representation of each sampled frame to form a compressed spectral representation of the sample frame; means for determining tonality of each sampled frame from a peak magnitude and an average magnitude of the compressed spectral representation of the sampled frame to distinguish tone-like components in the input audio signal from noise-like components in the input audio signal; means for selecting a masked threshold for each sampled frame corresponding to the determined tonality of the sampled frame, wherein masked thresholds selected for the tone-like components in the input audio signal are different from masked thresholds selected for the noise-like components in the input audio signal; and means for performing perceptual audio encoding on the sampled frames based on the selected masked thresholds to compress the tone-like features in the input audio signal at a different level of compression from the noise-like features in the input audio signal.
-
Specification