Automatic generation of video from spherical content using audio/visual analysis

US 9,652,667 B2
Filed: 03/03/2015
Issued: 05/16/2017
Est. Priority Date: 03/04/2014
Status: Active Grant

First Claim

Patent Images

1. A method for generating an output video from spherical video content, the method comprising:

storing, by a video server, a first spherical video obtained from a first camera system comprising a first sequence of spherical video frames, each having a first spherical field of viewstoring, by a video server, a second spherical video obtained from a second camera system comprising a second sequence of spherical video frames, each having a second spherical field of view;

processing, by the video server, the first spherical video to identify a target audio or visual feature of interest meeting one or more audio or visual criteria;

determining, by the video server, a first range of frames of the first spherical video having the target feature of interest;

determining, by the video server, a second range of frames of the second spherical video having the target feature of interest;

determining, by the video server, a first sequence of sub-frames from each of the first range of frames, each of the first sequence of sub-frames having a non-spherical field of view, and each of the first sequence of sub-frames including a spatial region around the target feature of interest;

determining, by the video server, a second sequence of sub-frames from each of the second range of frames, each of the second sequence of sub-frames having a non-spherical field of view, and each of the second sequence of sub-frames including a spatial region around the target feature of interest;

generating, by the video server, a first combined sequence of sub-frames including the target feature of interest, the combined sequence of sub-frames comprising the first sequence of sub-frames and the second sequence of sub-frames;

generating, by the video server, a first portion of an output video including the first combined sequence of sub-frames; and

outputting the output video.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A spherical content capture system captures spherical video content. A spherical video sharing platform enables users to share the captured spherical content and enables users to access spherical content shared by other users. In one embodiment, captured metadata or video/audio processing is used to identify content relevant to a particular user based on time and location information. The platform can then generate an output video from one or more shared spherical content files relevant to the user. The output video may include a non-spherical reduced field of view such as those commonly associated with conventional camera systems. Particularly, relevant sub-frames having a reduced field of view may be extracted from each frame of spherical video to generate an output video that tracks a particular individual or object of interest.

43 Citations

View as Search Results

19 Claims

1. A method for generating an output video from spherical video content, the method comprising:
- storing, by a video server, a first spherical video obtained from a first camera system comprising a first sequence of spherical video frames, each having a first spherical field of viewstoring, by a video server, a second spherical video obtained from a second camera system comprising a second sequence of spherical video frames, each having a second spherical field of view;
  
  processing, by the video server, the first spherical video to identify a target audio or visual feature of interest meeting one or more audio or visual criteria;
  
  determining, by the video server, a first range of frames of the first spherical video having the target feature of interest;
  
  determining, by the video server, a second range of frames of the second spherical video having the target feature of interest;
  
  determining, by the video server, a first sequence of sub-frames from each of the first range of frames, each of the first sequence of sub-frames having a non-spherical field of view, and each of the first sequence of sub-frames including a spatial region around the target feature of interest;
  
  determining, by the video server, a second sequence of sub-frames from each of the second range of frames, each of the second sequence of sub-frames having a non-spherical field of view, and each of the second sequence of sub-frames including a spatial region around the target feature of interest;
  
  generating, by the video server, a first combined sequence of sub-frames including the target feature of interest, the combined sequence of sub-frames comprising the first sequence of sub-frames and the second sequence of sub-frames;
  
  generating, by the video server, a first portion of an output video including the first combined sequence of sub-frames; and
  
  outputting the output video.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing a facial recognition algorithm on the first spherical video to identify first spatial locations of one or more faces depicted in the first spherical video;
      
      performing the facial recognition algorithm on the second spherical video to identify second spatial locations of the one or more faces depicted in the second spherical video.
  - 3. The method of claim 1, wherein processing the first spherical video to identify the target feature comprises:
    - performing an object recognition algorithm on the first spherical video to identify first spatial locations of one or more objects depicted in the first spherical; and
      
      performing the object recognition algorithm on the second spherical video to identify second spatial locations of one or more objects depicted in the first spherical video.
  - 4. The method of claim 1, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing a motion analysis algorithm on the first spherical video to identify first spatial locations of an object depicted in the first spherical video meeting predefined motion parameters;
      
      performing the motion analysis algorithm on the second spherical video to identify second spatial locations of the object depicted in the second spherical video meeting the predefined motion parameters.
  - 5. The method of claim 1, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing a gesture recognition algorithm on the first spherical video to identify first spatial locations of an individual performing a particular gesture depicted in the first spherical video; and
      
      performing the gesture recognition algorithm on the second spherical video to identify second spatial locations of the individual performing the particular gesture depicted in the second spherical video.
  - 6. The method of claim 1, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing an audio analysis on an audio track of first spherical video to identify a first direction between an audio source and the first camera capturing the first spherical audio;
      
      performing the audio analysis on an audio track of the second spherical video to identify a second direction between the audio source and the second camera capturing the second spherical audio;
      
      identifying a first spatial location of the audio source in the first spherical video based on the first direction between the audio source and the first camera; and
      
      identifying a second spatial location of the audio source in the second spherical video based on the second direction between the audio source and the second camera.
  - 7. The method of claim 1, further comprising:
    - storing, by the video server, a plurality of spherical videos comprising a sequence of spherical video frames each having a spherical field of view and one or more audio tracks obtained from a plurality of devices;
      
      processing, by the video server, the plurality of spherical videos to identify the target feature of interest meeting the one or more audio or visual criteria;
      
      determining, by the video server, a manifold range of frames of the plurality of spherical videos having the target feature of interest;
      
      determining, by the video server, a manifold sequence of sub-frames from each of the manifold range of frames, each of the manifold sequence of sub-frames having a non-spherical field of view, and each of the manifold sequence of sub-frames including a spatial region around the target feature of interest;
      
      determining, by the video server, a second combined sequence of sub-frames including the target feature of interest comprised of the first combination of sub-frames and the manifold sequence of sub-frames;
      
      generating, by the video server, a second portion of the output video including the second combination of sub-frames; and
      
      outputting the output video.

8. A non-transitory computer-readable storage medium storing instructions for generating an output video from spherical video content, the instructions when executed by one or more processors causing the one or more processors to perform steps including:
- storing a first spherical video obtained from a first camera system comprising a first sequence of spherical video frames each having a first spherical field of view;
  
  storing a second spherical video obtained from a second camera system comprising a second sequence of spherical video frames, each having a second spherical field of view;
  
  processing the first spherical video to identify a target audio or visual feature of interest meeting one or more audio or visual criteria;
  
  determining a first range of frames of the first spherical video having the target feature of interest;
  
  determining a second range of frames of the second spherical video having the target feature of interest;
  
  determining a first sequence of sub-frames from each of the first range of frames, each of the first sequence of sub-frames having a non-spherical field of view, and each of the first sequence of sub-frames including a spatial region around the target feature of interest;
  
  determining a second sequence of sub-frames from each of the second range of frames, each of the second sequence of sub-frames having a non-spherical field of view, and each of the second sequence of sub-frames including a spatial region around the target feature of interest;
  
  generating a first combined sequence of sub-frames including the target feature of interest comprising of the first sequence of sub-frames and the second sequence of sub-frames;
  
  generating a first portion of an output video including the first combined sequence of sub-frames; and
  
  outputting the output video.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The non-transitory computer-readable storage medium of claim 8, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing a facial recognition algorithm on the first spherical video to identify first spatial locations of one or more faces depicted in the first spherical video;
      
      performing the facial recognition algorithm on the second spherical video to identify second spatial locations of the one or more faces depicted in the second spherical video.
  - 10. The non-transitory computer-readable storage medium of claim 8, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing an object recognition algorithm on the first spherical video to identify spatial locations of one or more objects depicted in the first spherical video; and
      
      performing the object recognition algorithm on the second spherical video to identify second spatial locations of the one or more objects depicted in the second spherical video.
  - 11. The non-transitory computer-readable storage medium of claim 8, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing a motion analysis algorithm on the first spherical video to identify spatial locations of an object depicted in the first spherical video meeting predefined motion parameters; and
      
      performing the motion analysis algorithm on the second spherical video to identify second spatial locations of the object depicted in the second spherical video meeting predefined motion parameters.
  - 12. The non-transitory computer-readable storage medium of claim 8, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing a gesture recognition algorithm on the first spherical video to identify spatial locations of an individual performing a particular gesture depicted in the first spherical video;
      
      performing the gesture recognition algorithm on the second spherical video to identify second spatial locations of the individual performing a particular gesture depicted in the second spherical video.
  - 13. The non-transitory computer-readable storage medium of claim 8, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing an audio analysis on an audio track of first spherical video to identify a first direction between an audio source and the first camera capturing the first spherical audio;
      
      performing the audio analysis on an audio track of the second spherical video to identify a second direction between the audio source and the second camera capturing the second spherical audio;
      
      identifying a first spatial location of the audio source in the first spherical video based on the first direction between the audio source and the first camera; and
      
      identifying a second spatial location of the audio source in the second spherical video based on the second direction between the audio source and the second camera.
  - 14. The non-transitory computer-readable storage medium of claim 8, wherein the instructions when executed by the one or more processors further cause the one or more processors to perform steps including:
    - storing a plurality of spherical videos comprising a sequence of spherical video frames each having a spherical field of view and one or more audio tracks obtained from a plurality of devices;
      
      processing the plurality of spherical videos to identify the target feature of interest meeting the one or more audio or visual criteria;
      
      determining a manifold range of frames of the plurality of spherical videos having the target feature of interest;
      
      determining a manifold sequence of sub-frames from each of the manifold range of frames, each of the manifold sequence of sub-frames having a non-spherical field of view, and each of the manifold sequence of sub-frames including a spatial region around the target feature of interest; and
      
      determining a second combination of sub-frames including the target feature of interest comprised of the first combination of sub-frames and the manifold sequence of sub-frames; and
      
      generating a second portion of the output video including the second sequence of sub-frames; and
      
      outputting the output video.

15. A video server for generating an output video from spherical video content, the video server comprising:
- one or more processors; and
  
  a non-transitory computer-readable storage medium storing instructions that when executed by the one or more processors causing the one or more processors to perform steps including;
  
  storing a first spherical video from a first camera system comprising a first sequence of spherical video frames each having a first spherical field of view;
  
  storing a second spherical video from a second camera system comprising a second sequence of spherical video frames, each having a second spherical field of view;
  
  processing the first spherical video to identify a target audio or visual feature of interest meeting one or more audio or visual criteria;
  
  determining a first range of frames of the first spherical video having the target feature of interest;
  
  determining, by the video server, a second range of frames of the second spherical video having the target feature of interest;
  
  determining a first sequence of sub-frames from each of the first range of frames, each of the first sequence of sub-frames having a non-spherical field of view, and each of the first sequence of sub-frames including a spatial region around the target feature of interest;
  
  determining a second sequence of sub-frames from each of the second range of frames, each of the second sequence of sub-frames having a non-spherical field of view, and each of the second sequence of sub-frames including a spatial region around the target feature of interest;
  
  generating a first combined sequence of sub-frames including the target feature of interest comprising of the first sequence of sub-frames and the second sequence of sub-frames;
  
  generating a first portion of an output video including the first combined sequence of sub-frames; and
  
  outputting the output video.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The video server of claim 15, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing a facial recognition algorithm on the first spherical video to identify first spatial locations of one or more faces depicted in the first spherical videoperforming the facial recognition algorithm on the second spherical video to identify second spatial locations of the one or more faces depicted in the first spherical video.
  - 17. The video server of claim 15, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing an object recognition algorithm on the first spherical video to identify first spatial locations of one or more objects depicted in the first spherical video;
      
      performing the object recognition algorithm on the second spherical video to identify second spatial locations of the one or more objects depicted in the first spherical video.
  - 18. The video server of claim 15, wherein determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing an audio analysis on an audio track of first spherical video to identify a first direction between an audio source and the first camera capturing the first spherical audio;
      
      performing the audio analysis on an audio track of the second spherical video to identify a second direction between the audio source and the second camera capturing the second spherical audio;
      
      identifying a first spatial location of the audio source in the first spherical video based on the first direction between the audio source and the first camera; and
      
      identifying a second spatial location of the audio source in the second spherical video based on the second direction between the audio source and the second camera.
  - 19. The video server of claim 15, determining the first and the second sequence of sub-frames including the target feature comprises:
    - performing a motion analysis algorithm on the first spherical video to identify spatial locations of an object depicted in the first spherical video meeting predefined motion parameters; and
      
      performing the motion analysis algorithm on the second spherical video to identify second spatial locations of the object depicted in the second spherical video meeting predefined motion parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
GoPro, Inc.
Original Assignee
GoPro, Inc.
Inventors
MacMillan, Timothy, Newman, David A., Adsumilli, Balineedu Chowdary
Primary Examiner(s)
Shibru, Helen

Application Number

US14/637,193
Publication Number

US 20150256746A1
Time in Patent Office

805 Days
Field of Search

386285, 382118, 348 36- 37, 348169
US Class Current
CPC Class Codes

G03B 37/04   with cameras or projectors ...

G06F 16/71   Indexing; Data structures t...

G06T 3/12   Panospheric to cylindrical ...

G06V 20/40   in video content extracting...

H04L 65/612   for unicast

H04L 65/762   at the source reformatting...

H04N 13/106   Processing image signals fo...

H04N 21/233   Processing of audio element...

H04N 21/23418   involving operations for an...

H04N 23/698   for achieving an enlarged f...

Automatic generation of video from spherical content using audio/visual analysis

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

43 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic generation of video from spherical content using audio/visual analysis

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links