Optimized virtual scene layout for spatial meeting playback

US 10,057,707 B2
Filed: 02/03/2016
Issued: 08/21/2018
Est. Priority Date: 02/03/2015
Status: Active Grant

First Claim

Patent Images

1. A method for processing audio data, the method comprising:

receiving audio data corresponding to a recording of a conference involving a plurality of conference participants, the audio data including at least one of;

(a) audio data from multiple endpoints, the audio data for each of the multiple endpoints having been recorded separately or (b) audio data from a single endpoint corresponding to multiple conference participants and including spatial information for each conference participant of the multiple conference participants;

analyzing the audio data to determine conversational dynamics data that includes at least one data type selected from a list of data types consisting of;

data indicating the frequency and duration of conference participant speech;

data indicating instances of conference participant doubletalk during which at least two conference participants are speaking simultaneously; and

data indicating instances of conference participant conversations;

applying the conversational dynamics data as one or more variables of a spatial optimization cost function of a vector describing a virtual conference participant position for each of the conference participants in a virtual acoustic space;

applying an optimization technique to the spatial optimization cost function to determine a locally optimal solution; and

assigning the virtual conference participant positions in the virtual acoustic space based, at least in part, on the locally optimal solution.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various disclosed implementations involve processing and/or playback of a recording of a conference involving a plurality of conference participants. Some implementations involve receiving or determining conversational dynamics data. One or more variables of a cost function may be based, at least in part, on the conversational dynamics data. The cost function may be a spatial optimization cost function of a vector describing a virtual conference participant position for each of the conference participants in a virtual acoustic space. The virtual acoustic space may be determined relative to a listener'"'"'s head. The virtual conference participant positions may be assigned according to a solution of the cost function.

44 Citations

View as Search Results

20 Claims

1. A method for processing audio data, the method comprising:
- receiving audio data corresponding to a recording of a conference involving a plurality of conference participants, the audio data including at least one of;
  
  (a) audio data from multiple endpoints, the audio data for each of the multiple endpoints having been recorded separately or (b) audio data from a single endpoint corresponding to multiple conference participants and including spatial information for each conference participant of the multiple conference participants;
  
  analyzing the audio data to determine conversational dynamics data that includes at least one data type selected from a list of data types consisting of;
  
  data indicating the frequency and duration of conference participant speech;
  
  data indicating instances of conference participant doubletalk during which at least two conference participants are speaking simultaneously; and
  
  data indicating instances of conference participant conversations;
  
  applying the conversational dynamics data as one or more variables of a spatial optimization cost function of a vector describing a virtual conference participant position for each of the conference participants in a virtual acoustic space;
  
  applying an optimization technique to the spatial optimization cost function to determine a locally optimal solution; and
  
  assigning the virtual conference participant positions in the virtual acoustic space based, at least in part, on the locally optimal solution.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the conference is a teleconference.
  - 3. The method of claim 1, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants who are involved in conference participant doubletalk at virtual conference participant positions that are on, or within a predetermined angular distance from, a cone of confusion defined relative to the position of the virtual listener'"'"'s head, circular conical slices through the cone of confusion having identical inter-aural time differences.
  - 4. The method of claim 1, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants who are involved in a conference participant conversation with one another at virtual conference participant positions that are on, or within a predetermined angular distance from, a cone of confusion defined relative to the position of the virtual listener'"'"'s head, circular conical slices through the cone of confusion having identical inter-aural time differences.
  - 5. The method of claim 1, wherein analyzing the audio data also involves determining which conference participants, if any, have perceptually similar voices.
  - 6. The method of claim 5, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants with perceptually similar voices at virtual conference participant positions that are on, or within a predetermined angular distance from, a cone of confusion defined relative to the position of the virtual listener'"'"'s head, circular conical slices through the cone of confusion having identical inter-aural time differences.
  - 7. The method of claim 1, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants who speak frequently at virtual conference participant positions that are beside, behind, above, or below the position of the virtual listener'"'"'s head.
  - 8. The method of claim 1, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants who speak frequently at virtual conference participant positions that are farther from the position of the virtual listener'"'"'s head than the virtual conference participant positions of conference participants who speak less frequently.
  - 9. The method of claim 1, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants who speak infrequently at virtual conference participant positions that are not beside, behind, above or below the position of the virtual listener'"'"'s head.
  - 10. The method of claim 1, wherein the optimization technique involves at least one technique selected from a group of optimization techniques consisting of a gradient descent technique, conjugate gradient technique, Newton'"'"'s method, the Broyden-Fletcher-Goldfarb-Shanno algorithm;
    - a genetic algorithm, an algorithm for simulated annealing, an ant colony optimization method or a Monte Carlo method.
  - 11. The method of claim 1, wherein assigning a virtual conference participant position involves selecting a virtual conference participant position from a set of predetermined virtual conference participant positions.
  - 12. The method of claim 1, wherein the audio data includes output of a voice activity detection process.
  - 13. The method of claim 1, wherein analyzing the audio data involves identifying speech corresponding to individual conference participants.
  - 14. The method of claim 1, wherein the audio data corresponds to a recording of a complete or substantially complete conference.

15. A non-transitory medium having software stored thereon, the software including instructions for processing audio data by controlling at least one device for:
- receiving audio data corresponding to a recording of a conference involving a plurality of conference participants, the audio data including at least one of;
  
  (a) audio data from multiple endpoints, the audio data for each of the multiple endpoints having been recorded separately or (b) audio data from a single endpoint corresponding to multiple conference participants and including spatial information for each conference participant of the multiple conference participants;
  
  analyzing the audio data to determine conversational dynamics data that includes at least one data type selected from a list of data types consisting of;
  
  data indicating the frequency and duration of conference participant speech;
  
  data indicating instances of conference participant doubletalk during which at least two conference participants are speaking simultaneously; and
  
  data indicating instances of conference participant conversations;
  
  applying the conversational dynamics data as one or more variables of a spatial optimization cost function of a vector describing a virtual conference participant position for each of the conference participants in a virtual acoustic space;
  
  applying an optimization technique to the spatial optimization cost function to determine a locally optimal solution; and
  
  assigning the virtual conference participant positions in the virtual acoustic space based, at least in part, on the locally optimal solution.

16. An apparatus, comprising:
- an interface system; and
  
  a control system capable of;
  
  receiving, via the interface system, audio data corresponding to a recording of a conference involving a plurality of conference participants, the audio data including at least one of;
  
  (a) audio data from multiple endpoints, the audio data for each of the multiple endpoints having been recorded separately or (b) audio data from a single endpoint corresponding to multiple conference participants and including spatial information for each conference participant of the multiple conference participants;
  
  analyzing the audio data to determine conversational dynamics data that includes at least one data type selected from a list of data types consisting of;
  
  data indicating the frequency and duration of conference participant speech;
  
  data indicating instances of conference participant doubletalk during which at least two conference participants are speaking simultaneously; and
  
  data indicating instances of conference participant conversations;
  
  applying the conversational dynamics data as one or more variables of a spatial optimization cost function of a vector describing a virtual conference participant position for each of the conference participants in a virtual acoustic space;
  
  applying an optimization technique to the spatial optimization cost function to determine a locally optimal solution; and
  
  assigning the virtual conference participant positions in the virtual acoustic space based, at least in part, on the locally optimal solution.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The apparatus of claim 16, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants who are involved in conference participant doubletalk at virtual conference participant positions that are on, or within a predetermined angular distance from, a cone of confusion defined relative to the position of the virtual listener'"'"'s head, circular conical slices through the cone of confusion having identical inter-aural time differences.
  - 18. The apparatus of claim 16, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants who are involved in a conference participant conversation with one another at virtual conference participant positions that are on, or within a predetermined angular distance from, a cone of confusion defined relative to the position of the virtual listener'"'"'s head, circular conical slices through the cone of confusion having identical inter-aural time differences.
  - 19. The apparatus of claim 16, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants who speak frequently at virtual conference participant positions that are beside, behind, above, or below the position of the virtual listener'"'"'s head.
  - 20. The apparatus of claim 16, wherein the virtual acoustic space is determined relative to a position of a virtual listener'"'"'s head in the virtual acoustic space and wherein the spatial optimization cost function applies a penalty for placing conference participants who speak frequently at virtual conference participant positions that are farther from the position of the virtual listener'"'"'s head than the virtual conference participant positions of conference participants who speak less frequently.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Cartwright, Richard J., Muesch, Hannes
Primary Examiner(s)
HUBER, PAUL W

Application Number

US15/546,576
Publication Number

US 20180027351A1
Time in Patent Office

930 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 25/78   Detection of presence or ab...

H04M 3/42221   Conversation recording syst...

H04M 3/56   Arrangements for connecting...

H04M 3/568   audio processing specific t...

H04R 1/1016   Earpieces of the intra-aura...

H04R 2420/07   Applications of wireless lo...

H04S 2400/01   Multi-channel, i.e. more th...

H04S 2400/11   Positioning of individual s...

H04S 3/008   in which the audio signals ...

H04S 7/302   Electronic adaptation of st...

H04S 7/303   Tracking of listener positi...

Optimized virtual scene layout for spatial meeting playback

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

44 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Optimized virtual scene layout for spatial meeting playback

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

44 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links