Quantization matrices for digital audio

US 20030115051A1
Filed: 12/14/2001
Published: 06/19/2003
Est. Priority Date: 12/14/2001
Status: Active Grant

First Claim

Patent Images

1. In an audio encoder, a method comprising:

processing a group of frequency coefficients as critical bands according to an auditory model to generate an excitation pattern; and

computing a quantization matrix directly from and in proportion to the excitation pattern, the quantization matrix including weights for quantization bands that partition the group, wherein the quantization bands differ from the critical bands.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Quantization matrices facilitate digital audio encoding and decoding. An audio encoder generates and compresses quantization matrices; an audio decoder decompresses and applies the quantization matrices. The invention includes several techniques and tools, which can be used in combination or separately. For example, the audio encoder can generate quantization matrices from critical band patterns for blocks of audio data. The encoder can compute the quantization matrices directly from the critical band patterns, which can be computed from the same audio data that is being compressed. The audio encoder/decoder can use different modes for generating/applying quantization matrices depending on the coding channel mode of multi-channel audio data. The audio encoder/decoder can use different compression/decompression modes for the quantization matrices, including a parametric compression/decompression mode.

185 Citations

66 Claims

1. In an audio encoder, a method comprising:
- processing a group of frequency coefficients as critical bands according to an auditory model to generate an excitation pattern; and
  
  computing a quantization matrix directly from and in proportion to the excitation pattern, the quantization matrix including weights for quantization bands that partition the group, wherein the quantization bands differ from the critical bands.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein the quantization bands and the critical bands differ in one or more of number and frequency cut-off positions.
  - 3. The method of claim 1 wherein the group is a block in an audio channel.
  - 4. The method of claim 1 wherein the group comprises a first block in a first audio channel and a second block in a second audio channel.
  - 5. The method of claim 1 wherein the computing comprises determining a first weight by weighting the excitation pattern based upon which of the critical bands at least in part spectrally overlap a first quantization band.
  - 6. The method of claim 1 further comprising:
    - compensating for an outer/middle ear transfer function before the computing.
  - 7. The method of claim 1 wherein the weighting is proportional to extent of spectral overlap with the first quantization band.
  - 8. A computer-readable medium encoded with computer-executable instructions for causing a computer programmed thereby to perform the method of claim 1.

9. A computer-readable medium encoded with computer-executable instructions for causing a computer programmed thereby to perform a method comprising:
- receiving a group of frequency coefficients;
  
  processing the group of frequency coefficients as plural critical bands according to a model of human auditory perception to generate pattern information for the group of frequency coefficients;
  
  generating a quantization matrix for the group of frequency coefficients based at least in part upon the pattern information for the group of frequency coefficients, the quantization matrix including plural quantization bands partitioning the group of frequency coefficients, each of the plural quantization bands having a weight in the quantization matrix, wherein the plural quantization bands are different than the plural critical bands; and
  
  applying the quantization matrix to the group of frequency coefficients.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The computer-readable medium of claim 9 wherein the plural quantization bands and the plural critical bands differ in one or more of number and positions.
  - 11. The computer-readable medium of claim 9 wherein the pattern information is based at least in part upon an excitation pattern for the group of frequency coefficients.
  - 12. The computer-readable medium of claim 9 wherein the group of frequency coefficients is a block of frequency coefficients in an audio channel.
  - 13. The computer-readable medium of claim 9 wherein the group of frequency coefficients comprises a first block of frequency coefficients in a first audio channel and a second block of frequency coefficients in a second audio channel.
  - 14. The computer-readable medium of claim 9 wherein the generating the quantization matrix comprises determining a first weight by weighting the pattern information based upon which of the plural critical bands at least in part spectrally overlaps a first quantization band.
  - 15. The computer-readable medium of claim 14 wherein the weighting is proportional to extent of spectral overlap with the first quantization band.
  - 16. The computer-readable medium of claim 9 wherein frequency cut-off positions for the plural quantization bands and the plural critical bands are proportional to sampling rate.
  - 17. The computer-readable medium of claim 9 further comprising:
    - before the processing, transforming a group of audio samples into the group of frequency coefficients with a frequency transform.

18. An audio encoder comprising:
- a modeler for processing audio data according to a model of human auditory perception and for generating pattern information for the audio data, wherein each of plural critical bands spectrally partitions the audio data in the model of human auditory perception; and
  
  a program module for computing a set of plural weighting factors from and in proportion to the pattern information for the audio data, wherein each of the set of plural weighting factors comprises a weight for a different one of plural quantization bands that spectrally partition the audio data, wherein the quantization bands are different than the critical bands.
- View Dependent Claims (19, 20, 21, 22, 23)
- - 19. The encoder of claim 18 wherein the plural quantization bands and the plural critical bands differ in one or more of number and frequency cut-off positions.
  - 20. The encoder of claim 18 wherein the pattern information is based at least in part upon an excitation pattern.
  - 21. The encoder of claim 18 wherein the set of weighting factors comprises a first weighting factor based upon weighting of the pattern information according to which of the plural critical bands at least in part spectrally overlaps a first quantization band of the plural quantization bands.
  - 22. The encoder of claim 21 wherein the weighting is proportional to extent of spectral overlap with the first quantization band.
  - 23. The encoder of claim 21 further comprising:
    - a frequency transformer for transforming the audio data from audio samples into frequency coefficients and for outputting the frequency coefficients to the modeler for processing and to the program module for weighting according to the set of plural weighting factors.

24. A computer-readable medium having encoded therein computer-executable instructions for causing a computer programmed thereby to perform a method of generating quantization matrices for plural blocks, wherein each of the plural blocks has one of plural available block sizes, the method comprising:
- for each of the plural blocks, normalizing the block;
  
  computing pattern information for the normalized block in a block size-independent manner; and
  
  generating a quantization matrix based upon the pattern information.
- View Dependent Claims (25, 26, 27)
- - 25. The computer-readable medium of claim 24 wherein the plural blocks are frequency coefficient blocks, and wherein the computing includes processing the normalized frequency coefficient block according to an auditory model that includes temporal smearing between the normalized frequency coefficient block and an adjacent normalized frequency coefficient block.
  - 26. The computer-readable medium of claim 24 wherein the normalizing comprises normalizing block size of the block.
  - 27. The computer-readable medium of claim 24 wherein the normalizing comprises normalizing amplitude scale of the block.

28. An apparatus comprising:
- a multi-channel transformer operable to output multi-channel audio data in jointly coded channels; and
  
  a program module for generating a single quantization matrix for weighting all of the jointly coded channels.
- View Dependent Claims (29, 30, 31)
- - 29. The apparatus of claim 28 wherein the program module computes the single quantization matrix from an aggregation of pattern information for all of the jointly coded channels.
  - 30. The apparatus of claim 29 wherein the aggregation of pattern information is an aggregate excitation pattern.
  - 31. The apparatus of claim 28 wherein the multi-channel transformer is further operable to output multi-channel audio data in independently coded channels

32. A computer-readable medium encoded with computer-executable instructions for causing a computer programmed thereby to perform a method comprising:
- receiving first audio data in a first coding channel;
  
  receiving second audio data in a second coding channel;
  
  generating one or more quantization matrices for the first and second coding channels, wherein the generating comprises switching between different quantization matrix generation techniques based upon whether the first and second coding channels are joint coding channels; and
  
  outputting the one or more quantization matrices.
- View Dependent Claims (33, 34, 35, 36, 37, 38)
- - 33. The computer-readable medium of claim 32 wherein if the first and second coding channels are joint coding channels, the generating comprises computing a single quantization matrix for both of the first and second coding channels.
  - 34. The computer-readable medium of claim 32 wherein if the first and second coding channels are independent coding channels, the generating comprises computing a first quantization matrix for the first coding channel and a second quantization matrix for the second coding channel.
  - 35. The computer-readable medium of claim 32 wherein if the first and second coding channels are joint coding channels, the generating comprises aggregating pattern information for the first and second coding channels, wherein the generated one or more quantization matrices are based at least in part upon the aggregated pattern information
  - 36. The computer-readable medium of claim 35 wherein the aggregated pattern information is a minimum of first pattern information for the first coding channel and second pattern information for the second coding channel.
  - 37. The computer-readable medium of claim 35 wherein the aggregated pattern information is an average of of first pattern information for the first coding channel and second pattern information for the second coding channel.
  - 38. The computer-readable medium of claim 32 wherein if the first and second coding channels are independent coding channels, the generating comprises computing a first quantization matrix based upon first pattern information for the first coding channel and a second quantization matrix based upon second pattern information for the second coding channel.

39. A computer-readable medium encoded with computer-executable instructions for causing a computer programmed thereby to perform a method comprising:
- receiving one or more identical quantization matrices for first and second jointly coded channels of audio data, wherein each of the one or more identical quantization matrices is based at least in part upon an aggregated pattern for multiple channels of audio information; and
  
  applying the one or more identical quantization matrices to the first and second jointly coded channels of audio data.
- View Dependent Claims (40, 41)
- - 40. The computer-readable medium of claim 39 wherein the applying comprises weighting each of the first and second jointly coded channels with the one or more identical quantization matrices.
  - 41. The computer-readable medium of claim 39 further comprising:
    - inverse quantizing the first and second jointly coded channels by a quantization step size; and
      
      inverse multi-channel transforming the first and second jointly coded channels into left and right coded channels.

42. An apparatus comprising:
- a program module for applying one or more quantization matrices to multi-channel audio data in first and second coding channels in a coding channel mode-dependent manner, wherein the program module switches between plural available matrix application techniques based upon whether the first and second coding channels are joint coding channels; and
  
  an inverse multi-channel transformer operable to switch between plural coding channel modes, a first coding channel mode of the plural coding channel modes for receiving the first and second coding channels as joint coding channels, a second channel mode of the plural coding channel modes for receiving the first and second coding channels as independent coding channels.
- View Dependent Claims (43, 44)
- - 43. The apparatus of claim 42 wherein the program module applies an identical quantization matrix to the multi-channel audio data if the first and second coding channels are joint coding channels.
  - 44. The apparatus of claim 42 wherein the program module applies a different quantization matrix to each channel of the multi-channel audio data if the first and second coding channels are independent coding channels.

45. A computer-readable medium encoded with computer-executable instructions for causing a computer programmed thereby to perform a method comprising:
- processing at least one set of weighting factors according to a parametric model to switch between a direct representation and a parametric representation of the at least one set of weighting factors, wherein the parametric representation of the at least one set of weighting factors accounts for audibility of distortion according to a model of human auditory perception; and
  
  outputting a result of the processing.
- View Dependent Claims (46, 47, 48, 49, 50)
- - 46. The computer-readable medium of claim 45 wherein the processing comprises compression, and wherein the result is the parametric representation.
  - 47. The computer-readable medium of claim 45 wherein the processing comprises decompression, and wherein the result is the direct representation.
  - 48. The computer-readable medium of claim 45 wherein the parametric model uses linear predictive coding for the at least one set of weighting factors.
  - 49. The computer-readable medium of claim 48 wherein the at least one set of weighting factors is for a block of audio data, and wherein the pseudo-autocorrelation values differ from autocorrelation values for the block due at least in part to processing of the block according to an auditory model.
  - 50. The computer-readable medium of claim 48 wherein the pseudo-autocorrelation values differ from autocorrelation values for blocks of audio data due at least in part to joint channel coding of the blocks.

51. In an audio encoder, a method comprising:
- receiving a band weight representation of a quantization matrix; and
  
  compressing the band weight representation of the quantization matrix using linear predictive coding, wherein the compressing includes computing pseudo-autocorrelation values for the quantization matrix.
- View Dependent Claims (52, 53, 54, 55, 56)
- - 52. The method of claim 51 wherein the computing pseudo-autocorrelation values includes converting the band weight representation into an intermediate representation, and wherein the converting comprises:
    - for each of plural bands in the band weight representation, repeating a weight by an expansion factor in the intermediate representation, wherein the expansion factor relates to size of the band.
  - 53. The method of claim 52 wherein the converting further comprises:
    - mirroring the intermediate representation.
  - 54. The method of claim 53 wherein the converting further comprises:
    - inverse frequency transforming the mirrored intermediate representation, thereby producing the pseudo-autocorrelation values for the quantization matrix.
  - 55. The method of claim 51 wherein the computing pseudo-autocorrelation values comprises:
    - inverse frequency transforming an intermediate representation based upon the band weight representation.
  - 56. The method of claim 51 wherein the compressing further comprises:
    - computing linear predictive coding parameters based upon the pseudo-autocorrelation values.

57. A computer-readable medium encoded with computer-executable instructions for causing a computer programmed thereby to perform a method comprising:
- receiving a parametric representation of a quantization matrix, the quantization matrix including weights for bands of a group of frequency coefficients, wherein the parametric representation accounts for audibility of distortion according to a model of human auditory perception; and
  
  decompressing the parametric representation of the quantization matrix, thereby producing a direct representation of the quantization matrix.
- View Dependent Claims (58)
- - 58. The computer-readable medium of claim 57 wherein the parametric representation is based at least in part upon linear predictive coding of pseudo-autocorrelation values for the quantization matrix.

59. An audio encoder comprising:
- a weighter for generating one or more sets of weighting factors, each of the one or more sets of weighting factors including weights for bands of spectral audio data; and
  
  a program module for compressing the one or more sets of weighting factors according to a parametric model of compression, wherein the parametric model includes computing pseudo-autocorrelation values.
- View Dependent Claims (60, 61)
- - 60. The audio encoder of claim 59 further comprising:
    - a perception modeler for processing the spectral audio data according to an auditory model.
  - 61. The audio encoder of claim 59 further comprising:
    - a multi-channel transformer for converting multi-channel audio data into jointly coded channels.

62. A method of compressing a quantization matrix in an audio encoder comprising:
- compressing a quantization matrix using a compression mode selected from among plural available compression modes, the plural available compression modes including a direct compression mode and a parametric compression mode, wherein the parametric compression mode accounts for audibility of distortion according to an auditory model; and
  
  outputting the compressed quantization matrix.
- View Dependent Claims (63, 64)
- - 63. The method of claim 62 wherein selection of the compression mode is based upon bitrate criteria.
  - 64. The method of claim 62 wherein the parametric compression mode includes linear predictive coding using pseudo-autocorrelation values derived from the quantization matrix.

65. A computer-readable medium encoded with computer-executable instructions for causing a computer programmed thereby to perform a method of decompressing a quantization matrix in an audio decoder, the method comprising:
- receiving a compressed quantization matrix; and
  
  decompressing the compressed quantization matrix using a decompression mode selected from among plural available decompression modes, the plural available decompression modes including a direct decompression mode and a parametric decompression mode, the parametric decompression mode for decompressing a quantization matrix compressed according to a parametric compression mode that accounts for audibility of distortion according to an auditory model.
- View Dependent Claims (66)
- - 66. The computer-readable medium of claim 65 further comprising:
    - receiving a decompression mode indicator, wherein selection of the decompression mode is based upon the decompression mode indicator.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Lee, Ming-Chieh, Thumpudi, Naveen, Chen, Wei-Ge

Granted Patent

US 6,934,677 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/230
CPC Class Codes

G10L 19/008   Multichannel audio signal c...

G10L 19/02   using spectral analysis, e....

G10L 19/0204   using subband decomposition

Quantization matrices for digital audio

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

185 Citations

66 Claims

Specification

Use Cases

Quick Links

Others

Quantization matrices for digital audio

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

185 Citations

66 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others