SCALABLE DOWNMIX DESIGN WITH FEEDBACK FOR OBJECT-BASED SURROUND CODEC

US 20140023196A1
Filed: 07/18/2013
Published: 01/23/2014
Est. Priority Date: 07/20/2012
Status: Active Grant

First Claim

Patent Images

1. A method of audio signal processing, the method comprising:

based on spatial information for each of N audio objects, grouping a plurality of audio objects that includes the N audio objects into L clusters, where L is less than N;

mixing the plurality of audio objects into L audio streams; and

based on the spatial information and the grouping, producing metadata that indicates spatial information for each of the L audio streams,wherein a maximum value for L is based on information received from at least one of a transmission channel, a decoder, and a renderer.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In general, techniques are described for grouping audio objects into clusters. In some examples, a device for audio signal processing comprises a cluster analysis module configured to group, based on spatial information for each of N audio objects, a plurality of audio objects that includes the N audio objects into L clusters, where L is less than N, wherein the cluster analysis module is configured to receive information from at least one of a transmission channel, a decoder, and a renderer, and wherein a maximum value for L is based on the information received. The device also comprises a downmix module configured to mix the plurality of audio objects into L audio streams, and a metadata downmix module configured to produce, based on the spatial information and the grouping, metadata that indicates spatial information for each of the L audio streams.

Citations

43 Claims

1. A method of audio signal processing, the method comprising:
- based on spatial information for each of N audio objects, grouping a plurality of audio objects that includes the N audio objects into L clusters, where L is less than N;
  
  mixing the plurality of audio objects into L audio streams; and
  
  based on the spatial information and the grouping, producing metadata that indicates spatial information for each of the L audio streams,wherein a maximum value for L is based on information received from at least one of a transmission channel, a decoder, and a renderer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the information received includes information describing a state of the transmission channel and the maximum value of L is based at least on the state of the transmission channel.
  - 3. The method of claim 1, wherein the information received includes information describing a capacity of the transmission channel and the maximum value of L is based at least on the capacity of the transmission channel.
  - 4. The method of claim 1, wherein the information received is information received from a decoder.
  - 5. The method of claim 1, wherein the information received is information received from a renderer.
  - 6. The method of claim 1, wherein the information received comprises a bit rate indication that indicates a bit rate and the maximum value of L is based at least on the bit rate.
  - 7. The method of claim 1,wherein the N audio objects comprises N sets of coefficients, andwherein mixing the plurality of audio objects into L audio streams comprises mixing the plurality of sets of coefficients into L sets of coefficients.
  - 8. The method of claim 7, wherein each of N sets of coefficients is a hierarchical set of basis function coefficients.
  - 9. The method of claim 7, wherein each of the N sets of coefficients is a set of spherical harmonic coefficients.
  - 10. The method of claim 7, wherein each of the L sets of coefficients is a set of spherical harmonic coefficients.
  - 11. The method of claim 7, wherein mixing the plurality of audio objects into L audio streams comprises, for each of at least one among the L clusters, calculating a sum of the sets of coefficients of the N sets of coefficients grouped into the cluster.
  - 12. The method of claim 7, wherein mixing the plurality of audio objects into L audio streams comprises calculating each among the L sets of coefficients as a sum of the corresponding ones among the N sets of coefficients.
  - 13. The method of claim 7,wherein the information received comprises a bit rate indication that indicates a bit rate, andwherein, for at least one among the L sets of coefficients, a total number of coefficients in the set is based on a bit rate indication.
  - 14. The method of claim 7, wherein, for at least one among the L sets of coefficients, a total number of coefficients in the set is based on the information received.

15. An apparatus for audio signal processing, the apparatus comprising:
- means for receiving information from at least one of a transmission channel, a decoder, and a renderer;
  
  means for grouping, based on spatial information for each of N audio objects, a plurality of audio objects that includes the N audio objects into L clusters, where L is less than N and wherein a maximum value for L is based on the information received;
  
  means for mixing the plurality of audio objects into L audio streams; and
  
  means for producing, based on the spatial information and the grouping, metadata that indicates spatial information for each of the L audio streams.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The apparatus of claim 15, wherein the information received includes information describing a state of the transmission channel and the maximum value of L is based at least on the state of the transmission channel.
  - 17. The apparatus of claim 15, wherein the information received includes information describing a capacity of the transmission channel and the maximum value of L is based at least on the capacity of the transmission channel.
  - 18. The apparatus of claim 15, wherein the information received is information received from a decoder.
  - 19. The apparatus of claim 15, wherein the information received is information received from a renderer.
  - 20. The apparatus of claim 15, wherein the information received comprises a bit rate indication that indicates a bit rate and the maximum value of L is based at least on the bit rate.
  - 21. The apparatus of claim 15,wherein the N audio objects comprises N sets of coefficients, andwherein the means for mixing the plurality of audio objects into L audio streams comprises means for mixing the plurality of sets of coefficients into L sets of coefficients.
  - 22. The apparatus of claim 21, wherein each of N sets of coefficients is a hierarchical set of basis function coefficients.
  - 23. The apparatus of claim 21, wherein each of the N sets of coefficients is a set of spherical harmonic coefficients.
  - 24. The apparatus of claim 21, wherein each of the L sets of coefficients is a set of spherical harmonic coefficients.
  - 25. The apparatus of claim 21, wherein the means for mixing the plurality of audio objects into L audio streams comprises, for each of at least one among the L clusters, means for calculating a sum of the sets of coefficients of the N sets of coefficients grouped into the cluster.
  - 26. The apparatus of claim 21, wherein the means for mixing the plurality of audio objects into L audio streams comprises means for calculating each among the L sets of coefficients as a sum of the corresponding ones among the N sets of coefficients.
  - 27. The apparatus of claim 21,wherein the information received comprises a bit rate indication that indicates a bit rate, andwherein, for at least one among the L sets of coefficients, a total number of coefficients in the set is based on a bit rate indication.
  - 28. The apparatus of claim 21, wherein, for at least one among the L sets of coefficients, a total number of coefficients in the set is based on the information received.

29. A device for audio signal processing, the device comprising:
- a cluster analysis module configured to group, based on spatial information for each of N audio objects, a plurality of audio objects that includes the N audio objects into L clusters, where L is less than N,wherein the cluster analysis module is configured to receive information from at least one of a transmission channel, a decoder, and a renderer, and wherein a maximum value for L is based on the information received;
  
  a downmix module configured to mix the plurality of audio objects into L audio streams, anda metadata downmix module configured to produce, based on the spatial information and the grouping, metadata that indicates spatial information for each of the L audio streams.
- View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
- - 30. The device of claim 29, wherein the information received includes information describing a state of the transmission channel and the maximum value of L is based at least on the state of the transmission channel.
  - 31. The device of claim 29, wherein the information received includes information describing a capacity of the transmission channel and the maximum value of L is based at least on the capacity of the transmission channel.
  - 32. The device of claim 29, wherein the information received is information received from a decoder.
  - 33. The device of claim 29, wherein the information received is information received from a renderer.
  - 34. The device of claim 29, wherein the information received comprises a bit rate indication that indicates a bit rate and the maximum value of L is based at least on the bit rate.
  - 35. The device of claim 29,wherein the N audio objects comprises N sets of coefficients, andwherein the downmix module is configured to mix the plurality of audio objects into L audio streams by mixing the plurality of sets of coefficients into L sets of coefficients.
  - 36. The device of claim 35, wherein each of N sets of coefficients is a hierarchical set of basis function coefficients.
  - 37. The device of claim 35, wherein each of the N sets of coefficients is a set of spherical harmonic coefficients.
  - 38. The device of claim 35, wherein each of the L sets of coefficients is a set of spherical harmonic coefficients.
  - 39. The device of claim 35, wherein the downmix module is configured to mix the plurality of audio objects into L audio streams by, for each of at least one among the L clusters, calculating a sum of the sets of coefficients of the N sets of coefficients grouped into the cluster.
  - 40. The device of claim 35, wherein the downmix module is configured to mix the plurality of audio objects into L audio streams by calculating each among the L sets of coefficients as a sum of the corresponding ones among the N sets of coefficients.
  - 41. The device of claim 35,wherein the information received comprises a bit rate indication that indicates a bit rate, andwherein, for at least one among the L sets of coefficients, a total number of coefficients in the set is based on a bit rate indication.
  - 42. The device of claim 35, wherein, for at least one among the L sets of coefficients, a total number of coefficients in the set is based on the information received.

43. A non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors to:
- based on spatial information for each of N audio objects, group a plurality of audio objects that includes the N audio objects into L clusters, where L is less than N;
  
  mix the plurality of audio objects into L audio streams; and
  
  based on the spatial information and the grouping, produce metadata that indicates spatial information for each of the L audio streams,wherein a maximum value for L is based on information received from at least one of a transmission channel, a decoder, and a renderer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Xiang, Pei, Sen, Dipanjan

Granted Patent

US 9,479,886 B2
Time in Patent Office

Days
Field of Search
US Class Current

381/17
CPC Class Codes

G10L 19/008   Multichannel audio signal c...

G10L 19/22   Mode decision, i.e. based o...

G10L 19/24   Variable rate codecs, e.g. ...

H04S 1/007   in which the audio signals ...

H04S 2400/03   Aspects of down-mixing mult...

H04S 2400/11   Positioning of individual s...

H04S 2400/15   Aspects of sound capture an...

H04S 2420/03   Application of parametric c...

H04S 2420/11   Application of ambisonics i...

H04S 3/008   in which the audio signals ...

H04S 7/30   Control circuits for electr...

SCALABLE DOWNMIX DESIGN WITH FEEDBACK FOR OBJECT-BASED SURROUND CODEC

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

43 Claims

Specification

Solutions

Use Cases

Quick Links

SCALABLE DOWNMIX DESIGN WITH FEEDBACK FOR OBJECT-BASED SURROUND CODEC

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

43 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links