PERFORMING SPATIAL MASKING WITH RESPECT TO SPHERICAL HARMONIC COEFFICIENTS

US 20140355768A1
Filed: 05/27/2014
Published: 12/04/2014
Est. Priority Date: 05/28/2013
Status: Active Grant

First Claim

Patent Images

1. A method of compressing multi-channel audio data comprising:

performing spatial analysis based on a plurality of spherical harmonic coefficients that describe a three-dimensional sound field to identify a spatial masking threshold;

rendering the multi-channel audio data from the plurality of spherical harmonic coefficients; and

compressing the multi-channel audio data based on the identified spatial masking threshold to generate a bitstream.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In general, techniques are described by which to perform spatial masking with respect to spherical harmonic coefficients. As one example, an audio encoding device comprising a processor may perform various aspects of the techniques. The processor may be configured to perform spatial analysis based on the spherical harmonic coefficients describing a three-dimensional sound field to identify a spatial masking threshold. The processor may further be configured to render the multi-channel audio data from the plurality of spherical harmonic coefficients, and compress the multi-channel audio data based on the identified spatial masking threshold to generate a bitstream.

255 Citations

48 Claims

1. A method of compressing multi-channel audio data comprising:
- performing spatial analysis based on a plurality of spherical harmonic coefficients that describe a three-dimensional sound field to identify a spatial masking threshold;
  
  rendering the multi-channel audio data from the plurality of spherical harmonic coefficients; and
  
  compressing the multi-channel audio data based on the identified spatial masking threshold to generate a bitstream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising determining a target bitrate for the bitstream,wherein compressing the multi-channel audio data comprises performing, based on the target bitrate, either i) parametric inter-channel audio encoding and spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding to generate a bitstream representative of the compressed audio data.
  - 3. The method of claim 2, wherein performing either i) the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding comprises:
    - determining that the target bitrate is below a threshold bitrate; and
      
      in response to determining that the target bitrate is below the threshold bitrate, performing the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold to generate the bitstream.
  - 4. The method of claim 2, wherein performing either i) the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding comprises:
    - determining that the target bitrate is below a threshold bitrate; and
      
      in response to determining that the target bitrate is below the threshold bitrate, performing the spatial masking using the spatial masking threshold with respect to one or more base channels of the multi-channel audio data and performing the parametric inter-channel audio encoding with respect to the multi-channel audio data to generate the bitstream.
  - 5. The method of claim 1, wherein rendering the multi-channel audio data from the spherical harmonic coefficients comprises rendering 32 channels of the multi-channel audio data for 32 speakers from the spherical harmonic coefficients.
  - 6. The method of claim 1, wherein rendering the multi-channel audio data from the spherical harmonic coefficients comprises rendering 32 channels of the multi-channel audio data corresponding to 32 speakers arranged in a dense T-design from the spherical harmonic coefficients.
  - 7. The method of claim 1, wherein compressing the multi-channel audio data comprises allocating bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold.
  - 8. The method of claim 1, wherein compressing the multi-channel audio data comprises allocating bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold and a temporal masking threshold.
  - 9. The method of claim 1, wherein compressing the multi-channel audio data comprises performing entropy encoding based on the identified spatial masking threshold.
  - 10. The method of claim 1, further comprising transforming the plurality of spherical harmonic coefficients from the time domain to the frequency domain so as to generate a transformed plurality of spherical harmonic coefficients,wherein rendering the multi-channel audio data comprises rendering the multi-channel audio data from the transformed plurality of spherical harmonic coefficients.

11. An audio encoding device comprising:
- one or more processors configured to perform spatial analysis based on a plurality of spherical harmonic coefficients that describe a three-dimensional sound field to identify spatial masking thresholds, render the multi-channel audio data from the plurality of spherical harmonic coefficients, and compress the multi-channel audio data based on the identified spatial masking thresholds to generate a bitstream.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The audio encoding device of claim 11,wherein the one or more processors are further configured to determine a target bitrate for the bitstream, andwherein the one or more processors are configured to perform, based on the target bitrate, either i) parametric inter-channel audio encoding and spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding to generate a bitstream representative of the compressed audio data.
  - 13. The audio encoding device of claim 12, wherein the one or more processors are configured to determine that the target bitrate is below a threshold bitrate, and in response to determining that the target bitrate is below the threshold bitrate, perform the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold to generate the bitstream.
  - 14. The audio encoding device of claim 12, wherein the one or more processors are configured to determine that the target bitrate is below a threshold bitrate, and in response to determining that the target bitrate is below the threshold bitrate, perform the spatial masking using the spatial masking threshold with respect to one or more base channels of the multi-channel audio data and performing the parametric inter-channel audio encoding with respect to the multi-channel audio data to generate the bitstream.
  - 15. The audio encoding device of claim 11, wherein the one or more processors are further configured to render 32 channels of the multi-channel audio data for 32 speakers from the spherical harmonic coefficients.
  - 16. The audio encoding device of claim 11, wherein the one or more processors are further configured to render 32 channels of the multi-channel audio data corresponding to 32 speakers arranged in a dense T-design from the spherical harmonic coefficients.
  - 17. The audio encoding device of claim 11, wherein the one or more processors are further configured to allocate bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold.
  - 18. The audio encoding device of claim 11, wherein the one or more processors are further configured to allocate bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold and a temporal masking threshold.
  - 19. The audio encoding device of claim 11, wherein the one or more processors are further configured to perform entropy encoding based on the identified spatial masking thresholds.
  - 20. The audio encoding device of claim 11, wherein the one or more processors are further configured to transform the plurality of spherical harmonic coefficients from the time domain to the frequency domain so as to generate a transformed plurality of spherical harmonic coefficients, and, when rendering the multi-channel audio data, render the multi-channel audio data from the transformed plurality of spherical harmonic coefficients.

21. An audio encoding device comprising:
- means for performing spatial analysis based on a plurality of spherical harmonic coefficients that describe a three-dimensional sound field to identify a spatial masking threshold;
  
  means for rendering the multi-channel audio data from the plurality of spherical harmonic coefficients; and
  
  means for compressing the multi-channel audio data based on the identified spatial masking threshold to generate a bitstream.

22. A non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors of an audio encoding device to:
- perform spatial analysis based on a plurality of spherical harmonic coefficients that describe a three-dimensional sound field to identify a spatial masking threshold;
  
  render the multi-channel audio data from the plurality of spherical harmonic coefficients; and
  
  compress the multi-channel audio data based on the identified spatial masking thresholds to generate a bitstream.

23. A method comprising:
- decoding a bitstream to generate first multi-channel audio data having a plurality of channels corresponding to speakers arranged in a defined speaker geometry;
  
  performing an inverse rendering process with respect to the generated multi-channel audio data to generate a plurality of spherical harmonic coefficients; and
  
  rendering second multi-channel audio data having a plurality of channels corresponding to speakers arranged in a local speaker geometry based on the plurality of spherical harmonic coefficients.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 24. The method of claim 23, further comprising determining a target bitrate for the bitstream,wherein decoding the bitstream comprises performing, based on the target bitrate, parametric inter-channel audio decoding with respect to the bitstream to generate the first multi-channel audio data.
  - 25. The method of claim 24, wherein performing the parametric inter-channel audio decoding comprises:
    - determining that the target bitrate is below a threshold bitrate; and
      
      in response to determining that the target bitrate is below the threshold bitrate, performing the parametric inter-channel audio decoding with respect to the bitstream to generate the first multi-channel audio data.
  - 26. The method of claim 25, wherein the threshold bitrate is equal to 24-2 Kilobits per second (Kbps).
  - 27. The method of claim 23, wherein performing the inverse rendering process comprises performing the inverse rendering process with respect to 23 channels of the first multi-channel audio data that correspond to 23 speakers to generate the plurality of spherical harmonic coefficients.
  - 28. The method of claim 23, wherein performing the inverse rendering process comprises performing the inverse rendering process with respect to 23 channels of the first multi-channel audio data that correspond to 23 speakers arranged in a dense T-design to generate the plurality of spherical harmonic coefficients.
  - 29. The method of claim 23, further comprising transforming the plurality of spherical harmonic coefficients from the frequency domain to the time domain so as to generate a transformed plurality of spherical harmonic coefficients,wherein rendering the second multi-channel audio data comprises rendering the second multi-channel audio data having the plurality of channels corresponding to the speakers arranged in the local speaker geometry based on the transformed plurality of spherical harmonic coefficients.
  - 30. The method of claim 23, wherein rendering the second multi-channel audio data comprises performing a transform on the plurality of spherical harmonic coefficients to generate the second multi-channel audio data having the plurality of channels corresponding to the speakers arranged in the local speaker geometry based on the plurality of spherical harmonic coefficients.
  - 31. The method of claim 30,wherein the plurality of channels of the second multi-channel audio data comprise a plurality of virtual channels corresponding to virtual speakers arranged in a geometry different from the local speaker geometry, andwherein rendering the second multi-channel audio data further comprises performing panning on the plurality of virtual loudspeaker channels to produce the plurality of channels of the second multi-channel audio data corresponding to the speakers arranged in the local speaker geometry.
  - 32. The method of claim 31, wherein performing panning comprises performing vector base amplitude panning on the plurality of virtual channels to produce the plurality of channel of the second multi-channel audio data.
  - 33. The method of claim 32, wherein each of the plurality of virtual channels is associated with a corresponding different defined region of space.
  - 34. The method of claim 33, wherein the different defined regions of space are defined in one or more of an audio format specification and an audio format standard.

35. An audio decoding device comprising:
- one or more processors configured to decode a bitstream to generate first multi-channel audio data having a plurality of channels corresponding to speakers arranged in a first speaker geometry, perform an inverse rendering process with respect to the generated multi-channel audio data to generate a plurality of spherical harmonic coefficients, and render second multi-channel audio data having a plurality of channels corresponding to speakers arranged in a local speaker geometry based on the plurality of spherical harmonic coefficients.
- View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 36. The audio decoding device of claim 35, wherein the one or more processors are further configured to determine a target bitrate for the bitstream,wherein the one or more processors are configured to perform, based on the target bitrate, parametric inter-channel audio decoding with respect to the bitstream to generate the first multi-channel audio data.
  - 37. The audio decoding device of claim 36, wherein the one or more processors are configured to determine that the target bitrate is below a threshold bitrate, and in response to determining that the target bitrate is below the threshold bitrate, perform the parametric inter-channel audio decoding with respect to the bitstream to generate the first multi-channel audio data.
  - 38. The audio decoding device of claim 37, wherein the threshold bitrate is equal to 24-2 Kilobits per second (Kbps).
  - 39. The audio decoding device of claim 35, wherein the one or more processors are configured to, when performing the inverse rendering process, perform the inverse rendering process with respect to 23 channels of the first multi-channel audio data that correspond to 23 speakers to generate the plurality of spherical harmonic coefficients.
  - 40. The audio decoding device of claim 35, wherein the one or more processors are configured to, when performing the inverse rendering process, perform the inverse rendering process with respect to 23 channels of the first multi-channel audio data that correspond to 23 speakers arranged in a dense T-design to generate the plurality of spherical harmonic coefficients.
  - 41. The audio decoding device of claim 35, wherein the one or more processors are configured to transform the plurality of spherical harmonic coefficients from the frequency domain to the time domain so as to generate a transformed plurality of spherical harmonic coefficients,wherein the one or more processors are configured to, when rendering the second multi-channel audio data, render the second multi-channel audio data having the plurality of channels corresponding to the speakers arranged in the local speaker geometry based on the transformed plurality of spherical harmonic coefficients.
  - 42. The audio decoding device of claim 35, wherein the one or more processors are configured to, when rendering the second multi-channel audio data, perform a transform on the plurality of spherical harmonic coefficients to generate the second multi-channel audio data having the plurality of channels corresponding to the speakers arranged in the local speaker geometry based on the plurality of spherical harmonic coefficients.
  - 43. The audio decoding device of claim 42,wherein the plurality of channels of the second multi-channel audio data comprise a plurality of virtual channels corresponding to virtual speakers arranged in a geometry different from the local speaker geometry,wherein the one or more processors are configured to, when rendering the second multi-channel audio data, perform panning on the plurality of virtual loudspeaker channels to produce the plurality of channels of the second multi-channel audio data corresponding to the speakers arranged in the local speaker geometry.
  - 44. The audio decoding device of claim 43, wherein the one or more processors are configured to, when performing panning, perform vector base amplitude panning on the plurality of virtual channels to produce the plurality of channel of the second multi-channel audio data.
  - 45. The audio decoding device of claim 44, wherein each of the plurality of virtual channels is associated with a corresponding different defined region of space.
  - 46. The audio decoding device of claim 45, wherein the different defined regions of space are defined in one or more of an audio format specification and an audio format standard.

47. An audio decoding device comprising:
- means for decoding a bitstream to generate first multi-channel audio data having a plurality of channels corresponding to speakers arranged in a first speaker geometry;
  
  means for performing an inverse rendering process with respect to the generated multi-channel audio data to generate a plurality of spherical harmonic coefficients; and
  
  means for rendering second multi-channel audio data having a plurality of channels corresponding to speakers arranged in a local speaker geometry based on the plurality of spherical harmonic coefficients.

48. A non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors of an audio decoding device to:
- decode a bitstream to generate first multi-channel audio data having a plurality of channels corresponding to speakers arranged in a first speaker geometry;
  
  perform an inverse rendering process with respect to the generated multi-channel audio data to generate a plurality of spherical harmonic coefficients; and
  
  render second multi-channel audio data having a plurality of channels corresponding to speakers arranged in a local speaker geometry based on the plurality of spherical harmonic coefficients.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Sen, Dipanjan, Morrell, Martin James

Granted Patent

US 9,412,385 B2
Time in Patent Office

Days
Field of Search
US Class Current

381/23
CPC Class Codes

G10L 19/008 Multichannel audio signal c...

G10L 19/0212 using orthogonal transforma...

PERFORMING SPATIAL MASKING WITH RESPECT TO SPHERICAL HARMONIC COEFFICIENTS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

255 Citations

48 Claims

Specification

Solutions

Use Cases

Quick Links

PERFORMING SPATIAL MASKING WITH RESPECT TO SPHERICAL HARMONIC COEFFICIENTS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

255 Citations

48 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links