Coordinating and mixing audiovisual content captured from geographically distributed performers

US 10,587,780 B2
Filed: 01/08/2018
Issued: 03/10/2020
Est. Priority Date: 04/12/2011
Status: Active Grant

First Claim

Patent Images

1. A method of preparing coordinated audiovisual performances from geographically distributed performer contributions, the method comprising:

receiving via a communication network, a first audiovisual encoding of a first performer, including first performer vocals captured at a first remote device and first performer video;

determining, from the first performer vocals, at least one time-varying, computationally-defined audio feature, wherein the computationally-defined audio feature determined from the first performer vocals includes one or more of a measure of tempo correspondence with a melody track, a measure of tempo correspondence with a harmony track, a measure of pitch correspondence with a melody track, a measure of pitch correspondence with a harmony track, a measure of tempo correspondence with a score, and a measure of pitch correspondence with a score;

determining, from second performer vocals of a second audiovisual encoding of a second performer including the second performer vocals captured at a second device and second performer video, at least one time-varying, computationally-defined audio feature for comparison to the computationally-defined audio feature determined from the first performer vocals; and

based on comparison of the computationally-defined audio feature determined from the first and second performer vocals, dynamically varying relative visual prominence of first and second performer video throughout a combined audiovisual performance mix of the captured first and second performer vocals with a backing track and the first and second performer video.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Audiovisual performances, including vocal music, are captured and coordinated with those of other users in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for visually prominent presentation performance synchronized video of one or more of the contributors. Prominence of particular performance synchronized video may be based, at least in part, on computationally-defined audio features extracted from (or computed over) captured vocal audio. Over the course of a coordinated audiovisual performance timeline, these computationally-defined audio features are selective for performance synchronized video of one or more of the contributing vocalists.

94 Citations

View as Search Results

35 Claims

1. A method of preparing coordinated audiovisual performances from geographically distributed performer contributions, the method comprising:
- receiving via a communication network, a first audiovisual encoding of a first performer, including first performer vocals captured at a first remote device and first performer video;
  
  determining, from the first performer vocals, at least one time-varying, computationally-defined audio feature, wherein the computationally-defined audio feature determined from the first performer vocals includes one or more of a measure of tempo correspondence with a melody track, a measure of tempo correspondence with a harmony track, a measure of pitch correspondence with a melody track, a measure of pitch correspondence with a harmony track, a measure of tempo correspondence with a score, and a measure of pitch correspondence with a score;
  
  determining, from second performer vocals of a second audiovisual encoding of a second performer including the second performer vocals captured at a second device and second performer video, at least one time-varying, computationally-defined audio feature for comparison to the computationally-defined audio feature determined from the first performer vocals; and
  
  based on comparison of the computationally-defined audio feature determined from the first and second performer vocals, dynamically varying relative visual prominence of first and second performer video throughout a combined audiovisual performance mix of the captured first and second performer vocals with a backing track and the first and second performer video.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35)
- - 2. The method of claim 1, wherein the second device is a second remote device, and further comprising receiving via the communication network, the second audiovisual encoding of a second performer, including second performer vocals captured at the second remote device and second performer video.
  - 3. The method of claim 2, further comprising supplying the first and second remote devices with corresponding, but differing, versions of the combined audiovisual performance mix.
  - 4. The method of claim 1, wherein the computationally-defined audio feature determined from the first performer vocals further includes a spectral centroid.
  - 5. The method of claim 1, wherein the computationally-defined audio feature determined from the first performer vocals is a measure of at least a tempo correspondence with a melody or harmony track.
  - 6. The method of claim 1, wherein the computationally-defined audio feature determined from the first performer vocals is a measure of at least a pitch correspondence with a melody or harmony track.
  - 7. The method of claim 1, wherein the computationally-defined audio feature determined from the first performer vocals is a measure of at least a tempo or pitch correspondence with a score.
  - 8. The method of claim 1, further comprising pitch-shifting a more prominently featured of the first and second performer vocals to a vocal melody position in at least one of the corresponding, but differing, combined audiovisual performance mix versions supplied, and pitch-shifting a less prominently featured of the first and second performer vocals to a harmony position.
  - 9. The method of claim 1, further comprising adjusting amplitudes of respective spatially differentiated audio channels of the first and second performer vocals to provide apparent spatial separation therebetween in the supplied audiovisual performance mix versions.
  - 10. The method of claim 1, wherein the first and second first audiovisual encodings include, in addition to captured vocals, performance synchronized video.
  - 11. The method of claim 1, wherein the dynamic varying of relative visual prominence includes transitioning between prominent visual presentation of first performer video captured at the first remote device and prominent visual presentation of second performer video.
  - 12. The method of claim 11, wherein the transitioning is amongst video corresponding to three or more performers and their respective vocal performances.
  - 13. The method of claim 11, wherein the transitioning includes switching, wiping or crossfading of respective performer video.
  - 14. The method of claim 11, wherein the transitioning is performed, at least in some cases, prior to a triggering change in relative values of the computationally-defined audio feature.
  - 15. The method of claim 14, wherein the transitioning prominently presents performer video beginning just prior to onset of corresponding prominent vocals.
  - 16. The method of claim 14, wherein transitioning is subject to duration filtering or a hysteresis function.
  - 17. The method of claim 16, wherein duration filtering or hysteresis function parameters are selected to limit excessive visual transitions between performers.
  - 18. The method of claim 1, wherein the dynamically varied relative visual prominence includes, for at least some values of the computationally-defined audio feature determined from the first and second performer vocals, visual presentation of both first and second performer video, though with differing visual prominence.
  - 19. The method of claim 1, wherein the dynamically varied relative visual prominence includes, for at least some values of the computationally-defined audio feature, visual presentation of both first and second performer video, with equal levels of visual prominence.
  - 20. The method of claim 1, wherein the dynamically varied relative visual prominence includes, for at least some values of the computationally-defined audio feature, visual presentation of first or second performer video, but not both.
  - 21. The method of claim 1, wherein the computationally-defined audio feature is computed over pre-processed audio signals.
  - 22. The method of claim 21, wherein the pre-processing of the audio signals includes one or more of:
    - application of a bark-band auditory model;
      
      vocal detection; and
      
      noise cancellation.
  - 23. The method of claim 21, wherein the preprocessing is performed, at least in part, at the respective first or second remote device.
  - 24. The method of claim 1, further comprising:
    - inviting via electronic message or social network posting at least the second performer to join the combined audiovisual performance.
  - 25. The method of claim 24, wherein the inviting includes the supplying of the second remote device with the resulting combined audiovisual performance mix.
  - 26. The method of claim 1, further comprising:
    - supplying the first and second remote devices with a vocal score that encodes (i) a sequence of notes for a vocal melody and (ii) at least a first set of harmony notes for at least some portions of the vocal melody, wherein at least one of the received first and second performer vocals is pitch corrected at the respective first or second remote device in accord with the supplied vocal score.
  - 27. The method of claim 1, further comprising:
    - pitch correcting at least one of the received first and second performer vocals in accord with a vocal score that encodes (i) a sequence of notes for a vocal melody and (ii) at least a first set of harmony notes for at least some portions of the vocal melody.
  - 28. The method of claim 1, further comprising:
    - mixing either or both of the first and second performer vocals with the backing track, wherein the mixing results in a second mixed audiovisual performance, and supplying a third remote device with the second mixed audiovisual performance; and
      
      receiving via the communication network, a third audiovisual encoding of a third performer, including third performer vocals captured at the third remote device against a local audio rendering of the second mixed performance.
  - 29. The method of claim 28, further comprising:
    - including the captured third performer vocals in the combined audiovisual performance mix.
  - 30. The method of claim 1, wherein the first remote device and second device are selected from the group of:
    - a mobile phone;
      
      a personal digital assistant;
      
      a laptop computer, notebook computer, a pad-type computer or netbook.
  - 35. A computer program product encoded in non-transitory media and including instructions executable on a computational system to perform the method of preparing coordinated audiovisual performances from geographically distributed performer contributions of claim 1.

31. An apparatus comprising:
- a mobile computing device; and
  
  machine readable code embodied in a non-transitory medium and executable on the mobile computing device to receive via a communication network, a first audiovisual encoding of a first performer, including first performer vocals captured at a first remote device and first performer video;
  
  the machine readable code further executable to determine, from the first performer vocals, at least one time-varying, computationally-defined audio feature, wherein the computationally-defined audio feature determined from the first performer vocals includes one or more of a measure of tempo correspondence with a melody track, a measure of tempo correspondence with a harmony track, a measure of pitch correspondence with a melody track, a measure of pitch correspondence with a harmony track, a measure of tempo correspondence with a score, and a measure of pitch correspondence with a score;
  
  the machine readable code further executable to determine, from second performer vocals of a second audiovisual encoding of a second performer including the second performer vocals captured at a second device and second performer video, at least one time-varying, computationally-defined audio feature for comparison to the computationally-defined audio feature determined from the first performer vocals; and
  
  the machine readable code further executable to, based on comparison of the computationally-defined audio feature determined from the first and second performer vocals, dynamically vary relative visual prominence of first and second performer video throughout a combined audiovisual performance mix of the captured first and second performer vocals with a backing track and the first and second performer video.
- View Dependent Claims (32, 33)
- - 32. The apparatus of claim 31, wherein the dynamic varying of relative visual prominence includes transitioning between prominent visual presentation of first performer video captured at the first remote device and prominent visual presentation of second performer video.
  - 33. The apparatus of claim 31, wherein the dynamically varied relative visual prominence includes, for at least some values of the computationally-defined audio feature determined from the first and second performer vocals, visual presentation of both first and second performer video, though with differing visual prominence.

34. A service platform, comprising:
- one or more computing devices; and
  
  machine readable code embodied in a non-transitory medium and executable on at least one of the one or more computing devices to receive via a communication network, a first audiovisual encoding of a first performer, including first performer vocals captured at a first remote device and first performer video;
  
  the machine readable code further executable to determine, from the first performer vocals, at least one time-varying, computationally-defined audio feature, wherein the computationally-defined audio feature determined from the first performer vocals includes one or more of a measure of tempo correspondence with a melody track, a measure of tempo correspondence with a harmony track, a measure of pitch correspondence with a melody track, a measure of pitch correspondence with a harmony track, a measure of tempo correspondence with a score, and a measure of pitch correspondence with a score;
  
  the machine readable code further executable to determine, from second performer vocals of a second audiovisual encoding of a second performer including the second performer vocals captured at a second device and second performer video, at least one time-varying, computationally-defined audio feature for comparison to the computationally-defined audio feature determined from the first performer vocals; and
  
  the machine readable code further executable to, based on comparison of the computationally-defined audio feature determined from the first and second performer vocals, dynamically vary relative visual prominence of first and second performer video throughout a combined audiovisual performance mix of the captured first and second performer vocals with a backing track and the first and second performer video.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Smule, Inc.
Original Assignee
Smule, Inc.
Inventors
Godfrey, Mark T., Cook, Perry R.
Primary Examiner(s)
He, Jialong

Application Number

US15/864,819
Publication Number

US 20180262654A1
Time in Patent Office

792 Days
Field of Search
US Class Current
CPC Class Codes

G10H 1/366   with means for modifying or...

G10H 2210/066   for pitch analysis as part ...

G10H 2210/331   Note pitch correction, i.e....

G10H 2240/251   Mobile telephone transmissi...

G10L 13/0335   Pitch control

G10L 21/013   Adapting to target pitch

H04N 5/04   Synchronising for televisio...

Y10S 84/04   Chorus; ensemble; celeste

Coordinating and mixing audiovisual content captured from geographically distributed performers

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

94 Citations

35 Claims

Specification

Use Cases

Quick Links

Others

Coordinating and mixing audiovisual content captured from geographically distributed performers

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

94 Citations

35 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others