Schemes for emphasizing talkers in a 2D or 3D conference scene

US 9,961,208 B2
Filed: 03/21/2013
Issued: 05/01/2018
Est. Priority Date: 03/23/2012
Status: Active Grant

First Claim

Patent Images

1. A conference controller configured to place a plurality of upstream audio signals associated with a plurality of conference participants within a 2D or 3D conference scene to be rendered to a listener, wherein the conference controller is configured toset up an X-point conference scene with X different spatial talker locations within the conference scene, X being an integer greater than one;

assign each audio signal of the plurality of upstream audio signals to a different one of the X different spatial talker locations;

provide downstream audio signals and metadata to terminals corresponding to the conference participants, the metadata indicating where a terminal is to render audio signals for each of the X different spatial talker locations;

determine a degree of activity of the plurality of upstream audio signals at a time instant;

determine a dominant one of the plurality of upstream audio signals at the time instant based on the degrees of activity of the plurality of upstream audio signals at the time instant;

assign the dominant upstream audio signal to a first of the X talker locations; and

emphasize the dominant upstream audio signal at the time instant by changing the metadata indicating where a terminal is to render audio signals for a relative position of the talker location for the dominant upstream audio signal relative to other talker locations such that the updated talker location at which the dominant upstream audio signal will be rendered is the updated talker location closest to a midline in front of a head of the listener, wherein the conference controller is implemented via at least one of firmware or hardware.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present document relates to methods and systems for setting up and managing two-dimensional or three-dimensional scenes for audio conferences. A conference controller (111, 175) configured to place a plurality of upstream audio signals (123, 173) associated with a plurality of conference participants within a 2D or 3D conference scene to be rendered to a listener (211) is described. The conference controller (111, 175) is configured to set up a X-point conference scene with X different spatial talker locations (212) within the conference scene; assign the plurality of upstream audio signals (123, 173) to respective ones of the talker locations (212); determine a degree of activity of the plurality of upstream audio signals (123, 173); determine a dominant one of the plurality of upstream audio signals (123, 173); and emphasize the dominant upstream audio signal (123, 173).

41 Citations

View as Search Results

21 Claims

1. A conference controller configured to place a plurality of upstream audio signals associated with a plurality of conference participants within a 2D or 3D conference scene to be rendered to a listener, wherein the conference controller is configured toset up an X-point conference scene with X different spatial talker locations within the conference scene, X being an integer greater than one;
- assign each audio signal of the plurality of upstream audio signals to a different one of the X different spatial talker locations;
  
  provide downstream audio signals and metadata to terminals corresponding to the conference participants, the metadata indicating where a terminal is to render audio signals for each of the X different spatial talker locations;
  
  determine a degree of activity of the plurality of upstream audio signals at a time instant;
  
  determine a dominant one of the plurality of upstream audio signals at the time instant based on the degrees of activity of the plurality of upstream audio signals at the time instant;
  
  assign the dominant upstream audio signal to a first of the X talker locations; and
  
  emphasize the dominant upstream audio signal at the time instant by changing the metadata indicating where a terminal is to render audio signals for a relative position of the talker location for the dominant upstream audio signal relative to other talker locations such that the updated talker location at which the dominant upstream audio signal will be rendered is the updated talker location closest to a midline in front of a head of the listener, wherein the conference controller is implemented via at least one of firmware or hardware.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The conference controller of claim 1, wherein the metadata enables an audio processing unit of a terminal to generate a spatialized audio signal based on a set of downstream audio signals;
    - wherein the set of downstream audio signals comprises the dominant upstream audio signal;
      
      wherein when rendering the spatialized audio signal to the listener, the listener perceives the dominant upstream audio signal in an emphasized manner.
  - 3. The conference controller of claim 1, wherein the conference controller is configured to determine the degree of activity of an upstream audio signal at the time instant by determining an energy of the upstream audio signal at the time instant.
  - 4. The conference controller of claim 1, wherein the conference controller is configured to determine a dominant one of the plurality of upstream audio signals by determining an upstream audio signal having the highest degree of activity at the time instant.
  - 5. The conference controller of claim 1, whereinthe conference controller is configured to re-assign an upstream audio signal already assigned to the talker location closest to the midline to another talker location.
  - 6. The conference controller of claim 1, wherein the conference controller is configured to emphasize the dominant upstream audio signal at the time instant by increasing a rendering volume of the dominant upstream audio signal at the time instant.
  - 7. The conference controller of claim 1, wherein the conference controller is configured toemphasize the dominant upstream audio signal at the time instant by moving the first talker location closer to the listener.
  - 8. The conference controller of claim 1, whereinthe X talker locations are positioned within a cone around the midline in front of the head of the listener;
9. The conference controller of claim 1, wherein the conference controller is configured to reduce an angular distance between adjacent talker locations, in order to determine the updated talker locations.
10. The conference controller of claim 1, wherein the conference controller is configured todetermine a different new dominant one of the plurality of upstream audio signals at a second time instant after the time instant;
- de-emphasize the former dominant upstream audio signal at the second time instant; and
  
  emphasize the new dominant upstream audio signal at the second time instant.
11. The conference controller of claim 1, wherein the conference controller is configured to classify the X spatial talker locations into a plurality of clusters;
- wherein a first of the plurality of clusters comprises at least two spatial talker locations;
  
  wherein the spatial talker locations comprised within the first cluster are directly adjacent.
12. The conference controller of claim 11, wherein the conference controller is configured to classify the X spatial talker locations into the plurality of clusters dependent upon classification metadata.
13. The conference controller of claim 12, wherein the classification metadata comprises at least one of:
- an identifier associated with an electronic means of communication of a conference participant; and
  
  an identifier associated with a physical location of a conference participant.
14. The conference controller of claim 12, wherein the conference controller is configured to extract the classification metadata from one or more of the plurality of upstream audio signals.
15. The conference controller of claim 12, wherein the conference controller is configured to facilitate input of the classification metadata by a conference participant.
16. The conference controller of claim 8, wherein the conference controller is configured to calculate the X-point conference scene with X different spatial talker locations such that the X talker locations are positioned within the cone around the midline in front of the head of the listener.
17. The conference controller of claim 1, wherein the conference controller is configured to select the X-point conference scene with X different spatial talker locations from a set of pre-determined X-point conference scenes with X different pre-determined spatial talker locations.
18. The conference controller of claim 1, wherein the conference controller is configured to emphasize the dominant upstream audio signal at the time instant by modifying a height of the first talker location relative to the others of the X spatial talker locations.

19. An audio conferencing system comprising:
- a plurality of talker terminals configured to generate a plurality of upstream audio signals associated with a plurality of conference participants, respectively;
  
  a conference controller configured to;
  
  receive the plurality of upstream audio signals;
  
  assign each audio signal of the plurality of upstream audio signals to a different one of X different spatial talker locations within a 2D or 3D conference scene;
  
  provide downstream audio signals and metadata to terminals corresponding to the conference participants, the metadata indicating where a terminal is to render audio signals for each of the X different spatial talker locations;
  
  determine a dominant one of the plurality of upstream audio signals;
  
  assign the dominant upstream audio signal to a first of the X talker locations; and
  
  emphasize a dominant one of the plurality of upstream audio signals by changing the metadata indicating where a terminal is to render audio signals for a relative position of the talker location for the dominant upstream audio signal relative to other talker locations by re-assigning the dominant upstream audio signal to a center location within the 2D or 3D conference scene;
  
  wherein the center location corresponds to the talker location closest to a midline in front of a head of the listener; and
  
  a listener terminal configured to render the dominant upstream audio signal to a listener according to the metadata.

20. A method for placing a plurality of upstream audio signals associated with a plurality of conference participants within a 2D or 3D conference scene to be rendered to a listener, wherein the method comprises:
- setting up an X-point conference scene with X different spatial talker locations within the conference scene, X being an integer greater than one;
  
  assigning each audio signal of the plurality of upstream audio signals to a different one of the X different spatial talker locations;
  
  providing downstream audio signals and metadata to terminals corresponding to the conference participants, the metadata indicating where a terminal is to render audio signals for each of the X different spatial talker locations;
  
  determining a degree of activity of the plurality of upstream audio signals, at a time instant;
  
  determining a dominant one of the plurality of upstream audio signals at the time instant based on the degrees of activity of the plurality of upstream audio signals at the time instant;
  
  assigning the dominant upstream audio signal to a first of the X talker locations; and
  
  emphasizing the dominant upstream audio signal at the time instant by changing the metadata indicating where a terminal is to render audio signals for a relative position of the talker location for the dominant upstream audio signal relative to other talker locations such that the updated talker location at which the dominant upstream audio signal will be rendered is the updated talker location closest to a midline in front of a head of the listener.
- View Dependent Claims (21)
- - 21. A non-transitory storage medium comprising a software program adapted for execution on a processor and for performing the method of claim 20 when carried out on a computing device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Boustead, Paul, Spittle, Gary
Primary Examiner(s)
LONG, ANDREA NATAE

Application Number

US14/387,301
Publication Number

US 20150052455A1
Time in Patent Office

1,867 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/0484 for the control of specific...

H04M 3/568 audio processing specific t...

Schemes for emphasizing talkers in a 2D or 3D conference scene

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

41 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Schemes for emphasizing talkers in a 2D or 3D conference scene

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links