System and tools for enhanced 3D audio authoring and rendering

US 9,549,275 B2
Filed: 10/09/2015
Issued: 01/17/2017
Est. Priority Date: 07/01/2011
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving audio reproduction data comprising one or more audio objects and metadata associated with each of the one or more audio objects;

receiving reproduction environment data comprising an indication of a number of reproduction speakers in the reproduction environment and an indication of the location of each reproduction speaker within the reproduction environment; and

rendering the audio objects into one or more speaker feed signals by applying an amplitude panning process to each audio object, wherein the amplitude panning process is based, at least in part, on the metadata associated with each audio object and the location of each reproduction speaker within the reproduction environment, and wherein each speaker feed signal corresponds to at least one of the reproduction speakers within the reproduction environment;

wherein the metadata associated with each audio object includes audio object coordinates indicating the intended reproduction position of the audio object within the reproduction environment and a snap flag indicating whether the amplitude panning process should render the audio object into a single speaker feed signal or apply panning rules to render the audio object into a plurality of speaker feed signals.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Improved tools for authoring and rendering audio reproduction data are provided. Some such authoring tools allow audio reproduction data to be generalized for a wide variety of reproduction environments. Audio reproduction data may be authored by creating metadata for audio objects. The metadata may be created with reference to speaker zones. During the rendering process, the audio reproduction data may be reproduced according to the reproduction speaker layout of a particular reproduction environment.

Citations

20 Claims

1. A method, comprising:
- receiving audio reproduction data comprising one or more audio objects and metadata associated with each of the one or more audio objects;
  
  receiving reproduction environment data comprising an indication of a number of reproduction speakers in the reproduction environment and an indication of the location of each reproduction speaker within the reproduction environment; and
  
  rendering the audio objects into one or more speaker feed signals by applying an amplitude panning process to each audio object, wherein the amplitude panning process is based, at least in part, on the metadata associated with each audio object and the location of each reproduction speaker within the reproduction environment, and wherein each speaker feed signal corresponds to at least one of the reproduction speakers within the reproduction environment;
  
  wherein the metadata associated with each audio object includes audio object coordinates indicating the intended reproduction position of the audio object within the reproduction environment and a snap flag indicating whether the amplitude panning process should render the audio object into a single speaker feed signal or apply panning rules to render the audio object into a plurality of speaker feed signals.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein:
    - the snap flag indicates the amplitude panning process should render the audio object into a single speaker feed signal; and
      
      the amplitude panning process renders the audio object into a speaker feed signal corresponding to the reproduction speaker closest to the intended reproduction position of the audio object.
  - 3. The method of claim 1, wherein:
    - the snap flag indicates the amplitude panning process should render the audio object into a single speaker feed signal;
      
      a distance between the intended reproduction position of the audio object and the reproduction speaker closest to the intended reproduction position of the audio object exceeds a threshold; and
      
      the amplitude panning process overrides the snap flag and applies panning rules to render the audio object into a plurality of speaker feed signals.
  - 4. The method of claim 2, wherein:
    - the metadata is time-varying;
      
      the audio object coordinates indicating the intended reproduction position of the audio object within the reproduction environment differ at a first time instant and at a second time instant;
      
      at the first time instant, the reproduction speaker closest to the intended reproduction position of the audio object corresponds to a first reproduction speaker;
      
      at the second time instant the reproduction speaker closest to the intended reproduction position of the audio object corresponds to a second reproduction speaker; and
      
      the amplitude panning process smoothly transitions between rendering the audio object into a first speaker feed signal corresponding to the first reproduction speaker and rendering the audio object into a second speaker feed signal corresponding to the second reproduction speaker.
  - 5. The method of claim 1, wherein:
    - the metadata is time-varying;
      
      at a first time instant the snap flag indicates the amplitude panning process should render the audio object into a single speaker feed signal;
      
      at a second time instant the snap flag indicates the amplitude panning process should apply panning rules to render the audio object into a plurality of speaker feed signals; and
      
      the amplitude panning process smoothly transitions between rendering the audio object into a speaker feed signal corresponding to the reproduction speaker closest to the intended reproduction position of the audio object and applying panning rules to render the audio object into a plurality of speaker feed signals.
  - 6. The method of claim 1, wherein the audio panning process detects that a speaker feed signal may cause a corresponding reproduction speaker to overload, and in response, spreads one or more audio objects rendered into the speaker feed signal into one or more additional speaker feed signals corresponding to neighboring reproduction speakers.
  - 7. The method of claim 6, wherein the audio panning process determines the number of additional speaker feed signals into which an object is spread and/or selects the one or more audio objects to spread into the one or more additional speaker feed signals based, at least in part, on a signal amplitude of the one or more audio objects.
  - 8. The method of claim 6, wherein the metadata further comprises an indication of a content type of the audio object, and wherein the audio panning process selects the one or more audio objects to spread into the one or more additional speaker feed signals based, at least in part, on the content type of the audio object.
  - 9. The method of claim 6, wherein the metadata further comprises an indication of the importance of the audio object, and wherein the audio panning process selects the one or more audio objects to spread into the one or more additional speaker feed signals based, at least in part, on the importance of the audio object.

10. An apparatus, comprising:
- an interface system; and
  
  a logic system configured for;
  
  receiving, via the interface system, audio reproduction data comprising one or more audio objects and metadata associated with each of the one or more audio objects;
  
  receiving, via the interface system, reproduction environment data comprising an indication of a number of reproduction speakers in the reproduction environment and an indication of the location of each reproduction speaker within the reproduction environment; and
  
  rendering the audio objects into one or more speaker feed signals by applying an amplitude panning process to each audio object, wherein the amplitude panning process is based, at least in part, on the metadata associated with each audio object and the location of each reproduction speaker within the reproduction environment, and wherein each speaker feed signal corresponds to at least one of the reproduction speakers within the reproduction environment;
  
  wherein the metadata associated with each audio object includes audio object coordinates indicating the intended reproduction position of the audio object within the reproduction environment and a snap flag indicating whether the amplitude panning process should render the audio object into a single speaker feed signal or apply panning rules to render the audio object into a plurality of speaker feed signals.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The apparatus of claim 10, wherein:
    - the snap flag indicates the amplitude panning process should render the audio object into a single speaker feed signal; and
      
      the amplitude panning process renders the audio object into a speaker feed signal corresponding to the reproduction speaker closest to the intended reproduction position of the audio object.
  - 12. The apparatus of claim 10, wherein:
    - the snap flag indicates the amplitude panning process should render the audio object into a single speaker feed signal;
      
      a distance between the intended reproduction position of the audio object and the reproduction speaker closest to the intended reproduction position of the audio object exceeds a threshold; and
      
      the amplitude panning process overrides the snap flag and applies panning rules to render the audio object into a plurality of speaker feed signals.
  - 13. The apparatus of claim 11, wherein:
    - the metadata is time-varying;
      
      the audio object coordinates indicating the intended reproduction position of the audio object within the reproduction environment differ at a first time instant and at a second time instant;
      
      at the first time instant, the reproduction speaker closest to the intended reproduction position of the audio object corresponds to a first reproduction speaker;
      
      at the second time instant the reproduction speaker closest to the intended reproduction position of the audio object corresponds to a second reproduction speaker; and
      
      the amplitude panning process smoothly transitions between rendering the audio object into a first speaker feed signal corresponding to the first reproduction speaker and rendering the audio object into a second speaker feed signal corresponding to the second reproduction speaker.
  - 14. The apparatus of claim 10, wherein:
    - the metadata is time-varying;
      
      at a first time instant the snap flag indicates the amplitude panning process should render the audio object into a single speaker feed signal;
      
      at a second time instant the snap flag indicates the amplitude panning process should apply panning rules to render the audio object into a plurality of speaker feed signals; and
      
      the amplitude panning process smoothly transitions between rendering the audio object into a speaker feed signal corresponding to the reproduction speaker closest to the intended reproduction position of the audio object and applying panning rules to render the audio object into a plurality of speaker feed signals.
  - 15. The apparatus of claim 10, wherein the audio panning process detects that a speaker feed signal may cause a corresponding reproduction speaker to overload, and in response, spreads one or more audio objects rendered into the speaker feed signal into one or more additional speaker feed signals corresponding to neighboring reproduction speakers.
  - 16. The apparatus of claim 15, wherein the audio panning process selects the one or more audio objects to spread into the one or more additional speaker feed signals based, at least in part, on a signal amplitude of the one or more audio objects.
  - 17. The apparatus of claim 15, wherein the audio panning process determines the number of additional speaker feed signals into which an audio object is spread based, at least in part, on a signal amplitude of the audio object.
  - 18. The apparatus of claim 15, wherein the metadata further comprises an indication of a content type of the audio object, and wherein the audio panning process selects the one or more audio objects to spread into the one or more additional speaker feed signals based, at least in part, on the content type of the audio object.
  - 19. The apparatus of claim 15, wherein the metadata further comprises an indication of the importance of the audio object, and wherein the audio panning process selects the one or more audio objects to spread into the one or more additional speaker feed signals based, at least in part, on the importance of the audio object.

20. A non-transitory medium having software stored thereon, the software including instructions for performing the following operations:
- receiving audio reproduction data comprising one or more audio objects and metadata associated with each of the one or more audio objects;
  
  receiving reproduction environment data comprising an indication of a number of reproduction speakers in the reproduction environment and an indication of the location of each reproduction speaker within the reproduction environment; and
  
  rendering the audio objects into one or more speaker feed signals by applying an amplitude panning process to each audio object, wherein the amplitude panning process is based, at least in part, on the metadata associated with each audio object and the location of each reproduction speaker within the reproduction environment, and wherein each speaker feed signal corresponds to at least one of the reproduction speakers within the reproduction environment;
  
  wherein the metadata associated with each audio object includes audio object coordinates indicating the intended reproduction position of the audio object within the reproduction environment and a snap flag indicating whether the amplitude panning process should render the audio object into a single speaker feed signal or apply panning rules to render the audio object into a plurality of speaker feed signals.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Tsingos, Nicolas R, Robinson, Charles Q., Scharpf, Jurgen W.
Primary Examiner(s)
King, Simon

Application Number

US14/879,621
Publication Number

US 20160037280A1
Time in Patent Office

466 Days
Field of Search
US Class Current

1/1
CPC Class Codes

H04R 5/02   Spatial or constructional a...

H04S 2400/01   Multi-channel, i.e. more th...

H04S 2400/11   Positioning of individual s...

H04S 3/00   Systems employing more than...

H04S 3/008   in which the audio signals ...

H04S 5/00   Pseudo-stereo systems, e.g....

H04S 7/307   Frequency adjustment, e.g. ...

H04S 7/308   Electronic adaptation depen...

H04S 7/40   Visual indication of stereo...

System and tools for enhanced 3D audio authoring and rendering

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and tools for enhanced 3D audio authoring and rendering

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links