×

Associating audio with three-dimensional objects in videos

  • US 10,045,120 B2
  • Filed: 06/20/2016
  • Issued: 08/07/2018
  • Est. Priority Date: 06/20/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method for locating and tracking one or more audio sources recorded by a set of microphones, the set of microphones including a first microphone and a second microphone, the method comprising:

  • receiving position information for a camera;

    receiving position information for individual ones of the microphones in the set of microphones;

    receiving a video recorded by the camera, the video including a first visual object and a second visual object, the first visual object being a first audio source and the second visual object being a second audio source;

    receiving audio signals recorded by the microphones, the audio signals including a first audio signal recorded by the first microphone and a second audio signal recorded by the second microphone, individual ones of the audio signals including sounds generated by the first audio source and the second audio source, wherein the first audio signal includes a first audio component corresponding to the sounds generated by the first audio source and a second audio component corresponding to the sounds generated by the second audio source, and the second audio signal includes a third audio component corresponding to the sounds generated by the first audio source and a fourth audio component corresponding to the sounds generated by the second audio source;

    applying source separation to the audio signals to generate individual audio source signals for the sounds generated by individual audio sources, the audio source signals including a first audio source signal and a second audio source signal, the first audio source signal including the sounds generated by the first audio source and the second audio source signal including the sounds generated by the second audio source, wherein the first audio source signal is generated by combining the first audio component and the third audio component; and

    the second audio source signal is generated by combining the second audio component and the fourth audio component;

    estimating positions of the first audio source and the second audio source based on the position information for the microphones;

    estimating positions of the first visual object and the second visual object based on a visual analysis of the video and the position information for the camera;

    matching the individual audio sources to corresponding visual objects based on the estimated positions of the audio sources and the estimated positions of the visual objects such that the first audio source is matched to the first visual object and the second audio source is matched to the second visual object;

    tracking movement of the visual objects to generate visual object position data associated with movement of the visual objects; and

    storing audio source position data for the individual audio source signals based on the visual object position data associated with the visual object to which the respective audio source was matched.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×