System and Method for Adaptive Audio Signal Generation, Coding and Rendering

US 20160021476A1
Filed: 09/25/2015
Published: 01/21/2016
Est. Priority Date: 07/01/2011
Status: Active Grant

First Claim

Patent Images

1. A system for processing audio signals, comprising an authoring component configured to:

receive a plurality of audio signals;

generate an adaptive audio mix comprising a plurality of monophonic audio streams and metadata associated with each of the audio streams and indicating a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of a channel-based monophonic audio stream comprises a designation of a speaker in a speaker array, and the playback location of an object-based monophonic audio stream comprises a location in three-dimensional space, and wherein each object-based monophonic audio stream is rendered in at least one specific speaker of the speaker array; and

encapsulate the plurality of monophonic audio streams and the metadata in a bitstream for transmission to a rendering system configured to render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in a playback environment, wherein the speakers of the speaker array are placed at specific positions within the playback environment, and wherein metadata elements associated with each respective object-based monophonic audio stream indicate whether one or more sound components are rendered to a speaker feed for playback through a speaker nearest an intended playback location of the sound component, such that a respective object-based monophonic audio stream is effectively rendered by the speaker nearest to the intended playback location.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec packages the independent audio streams into a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer'"'"'s intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.

Citations

20 Claims

1. A system for processing audio signals, comprising an authoring component configured to:
- receive a plurality of audio signals;
  
  generate an adaptive audio mix comprising a plurality of monophonic audio streams and metadata associated with each of the audio streams and indicating a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of a channel-based monophonic audio stream comprises a designation of a speaker in a speaker array, and the playback location of an object-based monophonic audio stream comprises a location in three-dimensional space, and wherein each object-based monophonic audio stream is rendered in at least one specific speaker of the speaker array; and
  
  encapsulate the plurality of monophonic audio streams and the metadata in a bitstream for transmission to a rendering system configured to render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in a playback environment, wherein the speakers of the speaker array are placed at specific positions within the playback environment, and wherein metadata elements associated with each respective object-based monophonic audio stream indicate whether one or more sound components are rendered to a speaker feed for playback through a speaker nearest an intended playback location of the sound component, such that a respective object-based monophonic audio stream is effectively rendered by the speaker nearest to the intended playback location.
- View Dependent Claims (2, 3)
- - 2. The system of claim 1 wherein the authoring component includes a mixing console having controls operable by a user to indicate playback levels of the plurality of monophonic audio streams, and wherein the metadata elements associated with each respective object-based stream are automatically generated upon input to the mixing console controls by the user.
  - 3. The system of claim 1 further comprising an encoder coupled to the authoring component and configured to receive the plurality of monophonic audio streams and metadata and to generate a single digital bitstream containing the plurality of monophonic audio streams in an ordered fashion.

4. A system for processing audio signals, comprising a rendering system configured to:
- receive a bitstream encapsulating an adaptive audio mix comprising a plurality of monophonic audio streams and metadata associated with each of the audio streams and indicating a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of a channel-based monophonic audio stream comprises a designation of a speaker in a speaker array, and the playback location of an object-based monophonic audio stream comprises a location in three-dimensional space, and wherein each object-based monophonic audio stream is rendered in at least one specific speaker of the speaker array; and
  
  render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in a playback environment, wherein the speakers of the speaker array are placed at specific positions within the playback environment, and wherein metadata elements associated with each respective object-based monophonic audio stream indicate whether one or more sound components are rendered to a speaker feed for playback through a speaker nearest an intended playback location of the sound component, such that a respective object-based monophonic audio stream is effectively rendered by the speaker nearest to the intended playback location.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
- - 5. The system of claim 4, wherein the metadata elements associated with each respective object-based monophonic audio stream further indicate a spatial distortion threshold, and wherein the metadata element indicating whether a respective sound component is rendered by the speaker nearest to the intended playback location is ignored if the spatial distortion resulting from rendering the respective sound component by the speaker nearest to the intended playback location exceeds the spatial distortion threshold.
  - 6. The system of claim 5, wherein the spatial distortion threshold comprises at least one of an azimuth tolerance threshold and an elevation tolerance threshold.
  - 7. The system of claim 4, wherein the metadata elements associated with each respective object-based monophonic audio stream further indicate a crossfade rate parameter, and wherein when the speaker nearest to the intended playback location for the sound component changes from a first speaker to a second speaker, the rate at which the sound component changes from the first speaker to the second speaker is controlled in response to the crossfade rate parameter.
  - 8. The system of claim 4 wherein the metadata elements associated with each object-based monophonic audio stream further indicate spatial parameters controlling the playback of a corresponding sound component comprising one or more of:
    - sound position, sound width, and sound velocity.
  - 9. The system of claim 4 wherein the playback location for each of the plurality of object-based monophonic audio streams comprises a spatial position relative to a screen within the playback environment, or a surface that encloses the playback environment, and wherein the surface comprises a front plane, a back plane, a left plane, right plane, an upper plane, and a lower plane.
  - 10. The system of claim 4 wherein the rendering system selects a rendering algorithm utilized by the rendering system, the rendering algorithm selected from the group consisting of:
    - binaural, stereo dipole, Ambisonics, Wave Field Synthesis (WFS), multi-channel panning, raw stems with position metadata, dual balance, and vector-based amplitude panning.
  - 11. The system of claim 4 wherein the playback location for each of the plurality of object-based monophonic audio streams is independently specified with respect to either an egocentric frame of reference or an allocentric frame of reference, wherein the egocentric frame of reference is taken in relation to a listener in the playback environment, and wherein the allocentric frame of reference is taken with respect to a characteristic of the playback environment.

12. A method for authoring audio content for rendering, comprising:
- receiving a plurality of audio signals;
  
  generating an adaptive audio mix comprising a plurality of monophonic audio streams and metadata associated with each of the audio streams and indicating a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space, and wherein each object-based monophonic audio stream is rendered in at least one specific speaker of the speaker array; and
  
  encapsulating the plurality of monophonic audio streams and the metadata in a bitstream for transmission to a rendering system configured to render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in a playback environment, wherein the speakers of the speaker array are placed at specific positions within the playback environment, and wherein metadata elements associated with each respective object-based monophonic audio stream indicate whether one or more sound components are rendered to a speaker feed for playback through a speaker nearest an intended playback location of the sound component, such that an object-based monophonic audio stream is effectively rendered by the speaker nearest to the intended playback location.
- View Dependent Claims (13)
- - 13. The method of claim 12 further comprising:
    - receiving, from a mixing console having controls operated by a user to indicate playback levels of the plurality of monophonic audio streams comprising the audio content; and
      
      automatically generating the metadata elements associated with each respective object-based stream upon receipt of the user input.

14. A method for rendering audio signals, comprising:
- receiving a bitstream encapsulating an adaptive audio mix comprising a plurality of monophonic audio streams and metadata associated with each of the audio streams and indicating a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of a channel-based monophonic audio stream comprises a designation of a speaker in a speaker array, and the playback location of an object-based monophonic audio stream comprises a location in three-dimensional space, and wherein each object-based monophonic audio stream is rendered in at least one specific speaker of the speaker array; and
  
  rendering the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in a playback environment, wherein the speakers of the speaker array are placed at specific positions within the playback environment, and wherein metadata elements associated with each respective object-based monophonic audio stream indicate whether one or more sound components are rendered to a speaker feed for playback through a speaker nearest an intended playback location of the sound component, such that an object-based monophonic audio stream is effectively rendered by the speaker nearest to the intended playback location.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The method of claim 14, wherein the metadata elements associated with each respective object-based monophonic audio stream further indicate a spatial distortion threshold, and wherein the metadata element indicating whether a respective sound component is rendered by the speaker nearest to the intended playback location is ignored if the spatial distortion resulting from rendering the respective sound component by the speaker nearest to the intended playback location exceeds the spatial distortion threshold.
  - 16. The system of claim 15, wherein the spatial distortion threshold comprises at least one of an azimuth tolerance threshold and an elevation tolerance threshold.
  - 17. The system of claim 14, wherein the metadata elements associated with each respective object-based monophonic audio stream further indicate a crossfade rate parameter, and wherein when the speaker nearest to the intended playback location for the sound component changes from a first speaker to a second speaker, the rate at which the respective object transitions from the first speaker to the second speaker is controlled in response to the crossfade rate parameter.
  - 18. The system of claim 14 wherein the metadata elements associated with each object-based monophonic audio stream further indicate spatial parameters controlling the playback of a corresponding sound component comprising one or more of:
    - sound position, sound width, and sound velocity.
  - 19. The system of claim 14 wherein the playback location for each of the plurality of object-based monophonic audio streams comprises a spatial position relative to a screen within the playback environment, or a surface that encloses the playback environment, and wherein the surface comprises a front plane, a back plane, a left plane, right plane, an upper plane, and a lower plane, and/or is independently specified with respect to either an egocentric frame of reference or an allocentric frame of reference, wherein the egocentric frame of reference is taken in relation to a listener in the playback environment, and wherein the allocentric frame of reference is taken with respect to a characteristic of the playback environment.
  - 20. The system of claim 14 wherein the rendering system selects a rendering algorithm utilized by the rendering system, the rendering algorithm selected from the group consisting of:
    - binaural, stereo dipole, Ambisonics, Wave Field Synthesis (WFS), multi-channel panning, raw stems with position metadata, dual balance, and vector-based amplitude panning.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
TSINGOS, Nicolas R., CHABANNE, Christophe, ROBINSON, Charles Q.

Granted Patent

US 9,467,791 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 19/008   Multichannel audio signal c...

G10L 19/20   using sound class specific ...

H04R 27/00   Public address systems circ...

H04R 5/02   Spatial or constructional a...

H04R 5/04   Circuit arrangements, e.g. ...

H04S 2400/01   Multi-channel, i.e. more th...

H04S 2400/03   Aspects of down-mixing mult...

H04S 2400/11   Positioning of individual s...

H04S 2420/01   Enhancing the perception of...

H04S 2420/03   Application of parametric c...

H04S 2420/11   Application of ambisonics i...

H04S 2420/13   Application of wave-field s...

H04S 3/008   in which the audio signals ...

H04S 5/00   Pseudo-stereo systems, e.g....

H04S 5/005   of the pseudo five- or more...

H04S 7/30   Control circuits for electr...

H04S 7/302   Electronic adaptation of st...

H04S 7/305   Electronic adaptation of st...

H04S 7/308   Electronic adaptation depen...

System and Method for Adaptive Audio Signal Generation, Coding and Rendering

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and Method for Adaptive Audio Signal Generation, Coding and Rendering

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links