Tonal analysis for perceptual audio coding using a compressed spectral representation

US 7,333,930 B2
Filed: 03/14/2003
Issued: 02/19/2008
Est. Priority Date: 03/14/2003
Status: Active Grant

First Claim

Patent Images

1. A method for performing perceptual audio encoding on an input audio signal, the method comprising:

(a) sampling the input audio signal to generate multiple sampled frames;

(b) performing a first frequency transformation of each sampled frame into a frequency do main representation of the sample frame;

(c) applying a magnitude compression operation to the frequency domain representation of each sampled frame to form a magnitude-compressed representation of the sample frame;

(d) performing a second frequency transformation of the magnitude-compressed representation of each sampled frame to form a compressed spectral representation of the sample frame;

(e) determining tonality of each sampled frame from a peak magnitude and an average magnitude of the compressed spectral representation of the sampled frame to distinguish tone-like components in the input audio signal from noise-like components in the input audio signal;

(f) selecting a masked threshold for each sampled frame corresponding to the determined tonality of the sampled frame, wherein masked thresholds selected for the tone-like components in the input audio signal are different from masked thresholds selected for the noise-like components in the input audio signal; and

(g) performing perceptual audio encoding on the sampled frames based on the selected masked thresholds to compress the tone-like features in the input audio signal at a different level of compression from the noise-like features in the input audio signal.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides an apparatus, method and tangible medium storing instructions for determining tonality of an input audio signal, for selection of corresponding masked thresholds for use in perceptual audio coding. In the various embodiments, the input audio signal is sampled and transformed using a compressed spectral operation to form a compressed spectral representation, such as a cepstral representation. A peak magnitude and an average magnitude of the compressed spectral representation are determined. Depending upon the ratio of peak-to-average magnitudes, a masked threshold is selected having a corresponding degree of tonality, and is used to determine a plurality of quantization levels and a plurality of bit allocations to perceptually encode the input audio signal with a distortion spectrum beneath a level of just noticeable distortion (JND). The invention also includes other methods and variations for selecting substantially tone-like or substantially noise-like masked thresholds for perceptual encoding of the input audio signal.

Citations

43 Claims

1. A method for performing perceptual audio encoding on an input audio signal, the method comprising:
- (a) sampling the input audio signal to generate multiple sampled frames;
  
  (b) performing a first frequency transformation of each sampled frame into a frequency do main representation of the sample frame;
  
  (c) applying a magnitude compression operation to the frequency domain representation of each sampled frame to form a magnitude-compressed representation of the sample frame;
  
  (d) performing a second frequency transformation of the magnitude-compressed representation of each sampled frame to form a compressed spectral representation of the sample frame;
  
  (e) determining tonality of each sampled frame from a peak magnitude and an average magnitude of the compressed spectral representation of the sampled frame to distinguish tone-like components in the input audio signal from noise-like components in the input audio signal;
  
  (f) selecting a masked threshold for each sampled frame corresponding to the determined tonality of the sampled frame, wherein masked thresholds selected for the tone-like components in the input audio signal are different from masked thresholds selected for the noise-like components in the input audio signal; and
  
  (g) performing perceptual audio encoding on the sampled frames based on the selected masked thresholds to compress the tone-like features in the input audio signal at a different level of compression from the noise-like features in the input audio signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The invention of claim 1, wherein:
    - the first frequency transformation is a forward frequency transformation; and
      
      the second frequency transformation is an inverse frequency transformation.
  - 3. The invention of claim 2, wherein:
    - the forward frequency transformation is a Fourier transformation, a fast Fourier transformation (FFT), a discrete cosine transformation (DCT), or a z-transformation; and
      
      the inverse frequency transformation is an inverse Fourier transformation, an inverse FFT, an inverse DCT, or an inverse z-transformation.
  - 4. The invention of claim 1, wherein:
    - the first frequency transformation is a first forward frequency transformation; and
      
      the second frequency transformation is a second forward frequency transformation.
  - 5. The invention of claim 4, wherein:
    - the first forward frequency transformation is a Fourier transformation, an FFT, a DCT, or a z-transformation; and
      
      the second forward frequency transformation is a Fourier transformation, an FFT, a DCT, or a z-transformation.
  - 6. The invention of claim 1, wherein the magnitude compression operation is a logarithmic compression operation.
  - 7. The invention of claim 1, wherein the magnitude compression operation is an exponential compression operation.
  - 8. The invention of claim 1, wherein, for each sampled frame, step (e) comprises:
    - (e1) determining a ratio based on the peak magnitude and the average magnitude of the compressed spectral representation of the sampled frame; and
      
      (e2) determining the tonality of the sampled frame based on the ratio.
  - 9. The invention of claim 8, wherein, for each sampled frame:
    - step (e2) comprises comparing the ratio to a specified threshold level to determine whether to identify the tonality of the sampled frame as substantially tone-like or substantially noise-like; and
      
      step (f) comprises;
      
      (f1) selecting a tone-masked threshold for the masked threshold if the tonality of the sampled frame is identified as primarily tone-like; and
      
      (f2) selecting a noise-masked threshold for the masked threshold if the tonality of the sampled frame is identified as primarily noise-like.
  - 10. The invention of claim 8, wherein, for each sampled frame:
    - step (e2) comprises using the ratio to determine a degree to which the sampled frame is tone-like or noise-like; and
      
      step (f) comprises selecting the masked threshold as a function of the degree of the tonality of the sampled frame.
  - 11. The invention of claim 1, wherein, for each sampled frame, step (e) comprises:
    - (e1) determining a difference between the peak magnitude and the average magnitude of the compressed spectral representation of the sampled frame; and
      
      (e2) determining the tonality of the sampled frame based on the difference.
  - 12. The invention of claim 11, wherein, for each sampled frame:
    - step (e2) comprises comparing the difference to a specified threshold level to determine whether to identify the tonality of the sampled frame as primarily tone-like or primarily noise-like; and
      
      step (f) comprises;
      
      (f1) selecting a tone-masked threshold for the masked threshold if the tonality of the sampled frame is identified as primarily tone-like; and
      
      (f2) selecting a noise-masked threshold for the masked threshold if the tonality of the sampled frame is identified as primarily noise-like.
  - 13. The invention of claim 11, wherein, for each sampled frame:
    - step (e2) comprises using the difference to determine a degree to which the tonality of the sampled frame is tone-like or noise-like; and
      
      step (f) comprises selecting the masked threshold as a function of the degree of the tonality of the sampled frame.
  - 14. The invention of claim 1, wherein step (g) comprises using the selected masked thresholds to encode the sampled frames with a distortion spectrum beneath a level of just noticeable distortion (JND).
  - 15. The invention of claim 1, wherein step (g) comprises using the selected masked thresholds to determine quantization levels and bit allocations for quantizing and encoding the sampled frames.
  - 16. The invention of claim 1, wherein steps (e) and (f) are implemented independently for different frequency bands in the compressed spectral representation of each sampled frame to select a masked threshold for each different frequency band in the sampled frame.
  - 17. The invention of claim 1, wherein step (b) comprises performing an autocorrelation function on each sampled frame prior to performing the first frequency transformation.
  - 18. The invention of claim 1, wherein the determined tonality of each sampled frame is a measure of harmonicity of the sampled frame.
  - 19. The invention of claim 1, wherein step (e) comprises determining the tonality of each sampled frame from only a portion of the spectral components of the compressed spectral representation of the sampled frame.
  - 20. The invention of claim 1, wherein the compressed spectral representation of each sampled frame comprises at least one cepstral sequence.

21. An apparatus for performing perceptual audio encoding on an input audio signal, the apparatus comprising:
- a sampler adapted to sample the input audio signal to generate multiple sampled frames;
  
  a psychoacoustic analyzer adapted to (1) perform a first frequency transformation of each sampled frame into a frequency domain representation of the sampled frame, (2) apply a magnitude compression operation to the frequency domain representation of each sampled frame to form a magnitude-compressed representation of the sampled frame, (3) perform a second frequency transformation of the magnitude-compressed representation of each sampled frame to form a compressed spectral representation of the sampled frame, (4) determine tonality of each sampled frame from a peak magnitude and an average magnitude of the compressed spectral representation of the sampled frame to distinguish tone-like components in the input audio signal from noise-like components in the input audio signal, and (5) select a masked threshold for each sampled frame corresponding to the determined tonality of the sampled frame, wherein masked thresholds selected for the tone-like components in the input audio signal are different from masked thresholds selected for the noise-like components in the input audio signal; and
  
  an encoder adapted to perform perceptual audio encoding on the sampled frames based on the selected masked thresholds to compress the tone-like features in the input audio signal at a different level of compression from the noise-like features in the input audio signal.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
- - 22. The invention of claim 21, wherein:
    - the first frequency transformation is a forward frequency transformation; and
      
      the second frequency transformation is an inverse frequency transformation.
  - 23. The invention of claim 22, wherein:
    - the forward frequency transformation is a Fourier transformation, a fast Fourier transformation (FFT), a discrete cosine transformation (DCT), or a z-transformation; and
      
      the inverse frequency transformation is an inverse Fourier transformation, an inverse FFT, an inverse DCT, or an inverse z-transformation.
  - 24. The invention of claim 21, wherein:
    - the first frequency transformation is a first forward frequency transformation; and
      
      the second frequency transformation is a second forward frequency transformation.
  - 25. The invention of claim 24, wherein:
    - the first forward frequency transformation is a Fourier transformation, an FFT, a DCT, or a z-transformation; and
      
      the second forward frequency transformation is a Fourier transformation, an FFT, a DCT, or a z-transformation.
  - 26. The invention of claim 21, wherein the magnitude compression operation is a logarithmic compression operation.
  - 27. The invention of claim 21, wherein the magnitude compression operation is an exponential compression operation.
  - 28. The invention of claim 21, wherein, for each sampled frame, the psychoacoustic analyzer is adapted to:
    - determine a ratio based on the peak magnitude and the average magnitude of the compressed spectral representation of the sampled frame; and
      
      determine the tonality of the sampled frame based on the ratio.
  - 29. The invention of claim 28, wherein, for each sampled frame, the psychoacoustic analyzer is adapted to:
    - compare the ratio to a specified threshold level to determine whether to identify the tonality of the sampled frame as substantially tone-like or substantially noise-like;
      
      select a tone-masked threshold for the masked threshold if the tonality of the sampled frame is identified as primarily tone-like; and
      
      select a noise-masked threshold for the masked threshold if the tonality of the sampled frame is identified as primarily noise-like.
  - 30. The invention of claim 28, wherein, for each sampled frame, the psychoacoustic analyzer is adapted to:
    - use the ratio to determine a degree to which the sampled frame is tone-like or noise-like; and
      
      select the masked threshold as a function of the degree of the tonality of the sampled frame.
  - 31. The invention of claim 21, wherein, for each sampled frame, the psychoacoustic analyzer is adapted to:
    - determine a difference between the peak magnitude and the average magnitude of the compressed spectral representation of the sampled frame; and
      
      determine the tonality of the sampled frame based on the difference.
  - 32. The invention of claim 31, wherein, for each sampled frame, the psychoacoustic analyzer is adapted to:
    - compare the difference to a specified threshold level to determine whether to identify the tonality of the sampled frame as primarily tone-like or primarily noise-like;
      
      select a tone-masked threshold for the masked threshold if the tonality of the sampled frame is identified as primarily tone-like; and
      
      select a noise-masked threshold for the masked threshold if the tonality of the sampled frame is identified as primarily noise-like.
  - 33. The invention of claim 31, wherein, for each sampled frame, the psychoacoustic analyzer is adapted to:
    - use the difference to determine a degree to which the tonality of the sampled frame is tone-like or noise-like; and
      
      select the masked threshold as a function of the degree of the tonality of the sampled frame.
  - 34. The invention of claim 21, wherein the encoder is adapted to use the selected masked thresholds to encode the sampled frames with a distortion spectrum beneath a level of just noticeable distortion (JND).
  - 35. The invention of claim 21, wherein the encoder is adapted to use the selected masked thresholds to determine quantization levels and bit allocations for quantizing and encoding the sampled frames.
  - 36. The invention of claim 21, wherein, for each sampled frame, the psychoacoustic analyzer is adapted to determine the tonality of the sampled frame independently for different frequency bands in the compressed spectral representation of the sampled frame to select a masked threshold for each different frequency band in the sampled frame.
  - 37. The invention of claim 21, wherein, for each sampled frame, the psychoacoustic analyzer is adapted to perform an autocorrelation function on the sampled frame prior to performing the first frequency transformation.
  - 38. The invention of claim 21, wherein the determined tonality of each sampled frame is a measure of harmonicity of the sampled frame.
  - 39. The invention of claim 21, wherein, for each sampled frame, the psychoacoustic analyzer is adapted to determine the tonality of the sampled frame from only a portion of the spectral components of the compressed spectral representation of the sampled frame.
  - 40. The invention of claim 21, wherein the compressed spectral representation of each sampled frame comprises at least one cepstral sequence.
  - 41. The invention of claim 21, wherein the apparatus is an encoder.
  - 42. The invention of claim 21, wherein the apparatus is a transmitter.

43. Apparatus for performing perceptual audio encoding on an input audio signal, the apparatus comprising:
- means for sampling the input audio signal to generate multiple sampled frames;
  
  means for performing a first frequency transformation of each sampled frame into a frequency domain representation of the sample frame;
  
  means for applying a magnitude compression operation to the frequency domain representation of each sampled frame to form a magnitude-compressed representation of the sample frame;
  
  means for performing a second frequency transformation of the magnitude-compressed representation of each sampled frame to form a compressed spectral representation of the sample frame;
  
  means for determining tonality of each sampled frame from a peak magnitude and an average magnitude of the compressed spectral representation of the sampled frame to distinguish tone-like components in the input audio signal from noise-like components in the input audio signal;
  
  means for selecting a masked threshold for each sampled frame corresponding to the determined tonality of the sampled frame, wherein masked thresholds selected for the tone-like components in the input audio signal are different from masked thresholds selected for the noise-like components in the input audio signal; and
  
  means for performing perceptual audio encoding on the sampled frames based on the selected masked thresholds to compress the tone-like features in the input audio signal at a different level of compression from the noise-like features in the input audio signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Avago Technologies International Sales Pte Limited (Broadcom, Inc.), Much Shelist Denenberg Ament & Rubenstein PC
Original Assignee
Agere Systems Incorporated (Broadcom, Inc.)
Inventors
Baumgarte, Frank
Primary Examiner(s)
Hudspeth; David
Assistant Examiner(s)
Rider; Justin W.

Application Number

US10/389,000
Publication Number

US 20040181393A1
Time in Patent Office

1,803 Days
Field of Search

704/200.1, 704/500
US Class Current

704/200.1
CPC Class Codes

G10L 19/032 Quantisation or dequantisat...

Tonal analysis for perceptual audio coding using a compressed spectral representation

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

Citations

43 Claims

Specification

Solutions

Use Cases

Quick Links

Tonal analysis for perceptual audio coding using a compressed spectral representation

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

43 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links