System and method for adaptive audio signal generation, coding and rendering

US 9,179,236 B2
Filed: 06/27/2012
Issued: 11/03/2015
Est. Priority Date: 07/01/2011
Status: Active Grant

First Claim

Patent Images

1. A system for processing audio signals, comprising an authoring component configured to:

receive a plurality of audio signals;

generate an adaptive audio mix comprising a plurality of monophonic audio streams and one or more metadata sets associated with each of the plurality of monophonic audio streams and specifying a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and wherein the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space relative to a playback environment containing the speaker array; and

further wherein a first metadata set is applied to one or more of the plurality of monophonic audio streams for a first condition of the playback environment, and a second metadata set is applied to the one or more of the plurality of monophonic audio streams for a second condition of the playback environment; and

encapsulate the plurality of monophonic audio streams and the at least two metadata sets in a bitstream for transmission to a rendering system configured to render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in the playback environment in accordance with the at least two metadata sets based on a condition of the playback environment.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec packages the independent audio streams into a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer'"'"'s intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.

Citations

17 Claims

1. A system for processing audio signals, comprising an authoring component configured to:
- receive a plurality of audio signals;
  
  generate an adaptive audio mix comprising a plurality of monophonic audio streams and one or more metadata sets associated with each of the plurality of monophonic audio streams and specifying a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and wherein the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space relative to a playback environment containing the speaker array; and
  
  further wherein a first metadata set is applied to one or more of the plurality of monophonic audio streams for a first condition of the playback environment, and a second metadata set is applied to the one or more of the plurality of monophonic audio streams for a second condition of the playback environment; and
  
  encapsulate the plurality of monophonic audio streams and the at least two metadata sets in a bitstream for transmission to a rendering system configured to render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in the playback environment in accordance with the at least two metadata sets based on a condition of the playback environment.
- View Dependent Claims (2, 3)
- - 2. The system of claim 1 wherein the authoring component includes a mixing console having controls operable by the user to specify playback levels of the plurality of monophonic audio streams comprising the original audio content, and wherein the metadata elements associated with each respective object-based stream are automatically generated upon input to the mixing console controls by the user.
  - 3. The system of claim 1 further comprising an encoder coupled to the authoring component and configured to receive the plurality of monophonic audio streams and metadata and to generate a single digital bitstream containing the plurality of monophonic audio streams in an ordered fashion.

4. A system for processing audio signals, comprising a rendering system configured to:
- receive a bitstream encapsulating a plurality of monophonic audio streams and at least two metadata sets in a bitstream from an authoring component configured to receive a plurality of audio signals, and generate a plurality of monophonic audio streams and one or more metadata sets associated with each of the plurality of monophonic audio streams and specifying a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and wherein the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space relative to a playback environment containing the speaker array; and
  
  further wherein a first metadata set is applied to one or more of the plurality of monophonic audio streams for a first condition of the playback environment, and a second metadata set is applied to the one or more plurality of monophonic audio streams for a second condition of the playback environment; and
  
  render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in the playback environment in accordance with the at least two metadata sets based on a condition of the playback environment.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
- - 5. The system of claim 4 wherein each metadata set includes metadata elements associated with each object-based stream, the metadata elements for each object-based stream specifying spatial parameters controlling the playback of a corresponding object-based sound, and comprising one or more of:
    - sound position, sound width, and sound velocity; and
      
      further wherein each metadata set includes metadata elements associated with each channel-based stream, and the speaker array comprises speakers arranged in a defined surround sound configuration, and wherein the metadata elements associated with each channel-based stream comprises designations of surround-sound channels of the speakers in the speaker array in accordance with a defined surround-sound standard.
  - 6. The system of claim 4 wherein the speaker array includes additional speakers for playback of object-based streams that are positioned in the playback environment in accordance with set up instructions from a user based on the condition of the playback environment, and wherein the playback condition depends on variables comprising:
    - size and shape of a room of the playback environment, occupancy, material composition, and ambient noise; and
      
      further wherein the system receives a set-up file from the user that includes at least a list of speaker designations and a mapping of channels to individual speakers of the speaker array, information regarding grouping of speakers, and a mapping based on a relative position of speakers to the playback environment.
  - 7. The system of claim 4 wherein the metadata sets include metadata to enable upmixing or downmixing of at least one of the channel-based monophonic audio streams and the object-based monophonic audio streams in accordance with a change from a first configuration of the speaker array to a second configuration of the speaker array.
  - 8. The system of claim 6 wherein the metadata sets include metadata indicative of a content type of a monophonic audio stream;
    - wherein the content type is selected from the group consisting of;
      
      dialog, music, and effects, and each content type is embodied in a respective set of channel-based streams or object-based streams, and further wherein sound components of each content type are transmitted to defined speaker groups of one or more speaker groups designated within the speaker array.
  - 9. The system of claim 8 wherein the speakers of the speaker array are placed at specific positions within the playback environment, and wherein metadata elements associated with each respective object-based stream specify that one or more sound components are rendered to a speaker feed for playback through a speaker nearest an intended playback location of the sound component, as indicated by the position metadata.
  - 10. The system of claim 4 wherein the playback location comprises a spatial position relative to a screen within the playback environment, or a surface that encloses the playback environment, and wherein the surface comprises a front plane, a back plane, a left plane, right plane, an upper plane, and a lower plane.
  - 11. The system of claim 4 wherein the rendering system further comprises means for selecting a rendering algorithm utilized by the rendering system, the rendering algorithm selected from the group consisting of:
    - binaural, stereo dipole, Ambisonics, Wave Field Synthesis (WFS), multi-channel panning, raw stems with position metadata, dual balance, and vector-based amplitude panning.
  - 12. The system of claim 4 wherein the playback location for each of the plurality of monophonic audio streams is independently specified with respect to either an egocentric frame of reference or an allocentric frame of reference, wherein an egocentric frame of reference is taken in relation to a listener in the playback environment, and wherein the allocentric frame of reference is taken with respect to a characteristic of the playback environment.

13. A method of authoring audio signals for rendering, comprising:
- receiving a plurality of audio signals;
  
  generating an adaptive audio mix comprising a plurality of monophonic audio streams and one or more metadata sets associated with each of the plurality of monophonic audio streams and specifying a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and wherein the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space relative to a playback environment containing the speaker array; and
  
  further wherein a first metadata set is applied to one or more of the plurality of monophonic audio streams for a first condition of the playback environment, and a second metadata set is applied to the one or more of the plurality of monophonic audio streams for a second condition of the playback environment; and
  
  encapsulating the plurality of monophonic audio streams and the one or more metadata sets in a bitstream for transmission to a rendering system configured to render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in the playback environment in accordance with the at least two metadata sets based on a condition of the playback environment.
- View Dependent Claims (14)
- - 14. The method of claim 13 further comprising:
    - receiving, from a mixing console having controls operated by a user to specify playback levels of the plurality of monophonic audio streams comprising the original audio content; and
      
      automatically generating the metadata elements associated with each respective object-based stream generated upon receipt of the user input.

15. A method of rendering audio signals, comprising:
- receiving a bitstream encapsulating a plurality of monophonic audio streams and at least two metadata sets in a bitstream from an authoring component configured to receive a plurality of audio signals, and generate a plurality of monophonic audio streams and one or more metadata sets associated with each of the plurality of monophonic audio streams and specifying a playback location of a respective monophonic audio stream, wherein at least some of the plurality of monophonic audio streams are identified as channel-based audio and wherein the others of the plurality of monophonic audio streams are identified as object-based audio, and wherein the playback location of the channel-based audio comprises speaker designations of speakers in a speaker array, and the playback location of the object-based audio comprises a location in three-dimensional space relative to a playback environment containing the speaker array; and
  
  further wherein a first metadata set is applied to one or more of the plurality of monophonic audio streams for a first condition of the playback environment, and a second metadata set is applied to the one or more plurality of monophonic audio streams for a second condition of the playback environment; and
  
  rendering the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in the playback environment in accordance with the at least two metadata sets based on a condition of the playback environment.
- View Dependent Claims (16, 17)
- - 16. The method of claim 15 wherein each metadata set includes metadata elements associated with each object-based stream, the metadata elements for each object-based stream specifying spatial parameters controlling the playback of a corresponding object-based sound, and comprising one or more of:
    - sound position, sound width, and sound velocity; and
      
      further wherein each metadata set includes metadata elements associated with each channel-based stream, and the speaker array comprises speakers arranged in a defined surround sound configuration, and wherein the metadata elements associated with each channel-based stream comprises designations of surround-sound channels of the speakers in the speaker array in accordance with a defined surround-sound standard.
  - 17. The method of claim 15 wherein the speaker array includes additional speakers for playback of object-based streams that are positioned in the playback environment, the method further comprising receiving set up instructions from a user based on the condition of the playback environment, and wherein the playback condition depends on variables comprising:
    - size and shape of a room of the playback environment, occupancy, material composition, and ambient noise;
      
      the setup instructions further including at least a list of speaker designations and a mapping of channels to individual speakers of the speaker array, information regarding grouping of speakers, and a mapping based on a relative position of speakers to the playback environment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Robinson, Charles Q., Tsingos, Nicolas R., Chabanne, Christophe
Primary Examiner(s)
BLOUIN, MARK S

Application Number

US14/130,386
Publication Number

US 20140133683A1
Time in Patent Office

1,224 Days
Field of Search

381/303
US Class Current

1/1
CPC Class Codes

G10L 19/008   Multichannel audio signal c...

G10L 19/20   using sound class specific ...

H04R 27/00   Public address systems circ...

H04R 5/02   Spatial or constructional a...

H04R 5/04   Circuit arrangements, e.g. ...

H04S 2400/01   Multi-channel, i.e. more th...

H04S 2400/03   Aspects of down-mixing mult...

H04S 2400/11   Positioning of individual s...

H04S 2420/01   Enhancing the perception of...

H04S 2420/03   Application of parametric c...

H04S 2420/11   Application of ambisonics i...

H04S 2420/13   Application of wave-field s...

H04S 3/008   in which the audio signals ...

H04S 5/00   Pseudo-stereo systems, e.g....

H04S 5/005   of the pseudo five- or more...

H04S 7/30   Control circuits for electr...

H04S 7/302   Electronic adaptation of st...

H04S 7/305   Electronic adaptation of st...

H04S 7/308   Electronic adaptation depen...

System and method for adaptive audio signal generation, coding and rendering

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for adaptive audio signal generation, coding and rendering

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links