System and method for automatic video editing using object recognition
First Claim
1. An automated video editing system for creating an edited output video stream from one or more input video streams, comprising using a computing device for:
- receiving one or more input video streams;
automatically partitioning each input video stream into one or more scenes;
identifying a list of possible candidate shots for each scene;
parsing each scene to derive information of interest relating to objects detected within each scene;
selecting a best shot from the list of possible candidate shots for each scene as a function of the information derived via parsing of each scene;
constructing the selected best shot from the corresponding scenes from the input video streams; and
outputting the constructed shot for each scene as the edited output video.
2 Assignments
0 Petitions
Accused Products
Abstract
An “automated video editor” (AVE) automatically processes one or more input videos to create an edited video stream with little or no user interaction. The AVE produces cinematic effects such as cross-cuts, zooms, pans, insets, 3-D effects, etc., by applying a combination of cinematic rules, object recognition techniques, and digital editing of the input video. Consequently, the AVE is capable of using a simple video taken with a fixed camera to automatically simulate cinematic editing effects that would normally require multiple cameras and/or professional editing. The AVE first defines a list of scenes in the video and generates a rank-ordered list of candidate shots for each scene. Each frame of each scene is then analyzed or “parsed” using object detection techniques (“detectors”) for isolating unique objects (faces, moving/stationary objects, etc.) in the scene. Shots are then automatically selected for each scene and used to construct the edited video stream.
74 Citations
20 Claims
-
1. An automated video editing system for creating an edited output video stream from one or more input video streams, comprising using a computing device for:
-
receiving one or more input video streams;
automatically partitioning each input video stream into one or more scenes;
identifying a list of possible candidate shots for each scene;
parsing each scene to derive information of interest relating to objects detected within each scene;
selecting a best shot from the list of possible candidate shots for each scene as a function of the information derived via parsing of each scene;
constructing the selected best shot from the corresponding scenes from the input video streams; and
outputting the constructed shot for each scene as the edited output video. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
12. A method for automatically generating edited output video streams from one or more input video streams, comprising using a computing device to perform the following steps:
-
a receiving step for receiving one or more input video streams;
a scene detection step for analyzing each video stream to identify individual scenes in each video stream;
a scene analysis step for analyzing each scene to identify one or more possible candidate shots that can be constructed from the detected scenes;
a scene parsing step for examining each scene to identify available information within each scene;
a shot selection step for selecting a best shot from the candidate shots as a function of the information identified via parsing of each scene;
a video construction step for constructing the selected best shot from one or more corresponding scenes; and
a video output step for outputting the constructed shot for inclusion in the edited output video. - View Dependent Claims (13, 14, 15)
-
-
16. A computer-readable medium having computer executable instructions for automatically generating an edited output video stream, said computer executable instructions comprising:
-
examining a plurality of input video streams to identify each of a plurality of individual scenes in each input video stream;
identifying a set of possible candidate shots for each scene as a function of a user selectable template which defines allowable candidate shots for the user selected template;
examining content of each scene using a set of one or more object detectors to derive information pertaining to one or more objects detected within one or more frames of each scene;
selecting a best shot from the set of possible candidate shots for each scene as a function of the information derived from the one or more detected objects of each scene, said best shot selection being further constrained by a set of one or more cinematic rules;
constructing the selected best shot for each scene from the corresponding scenes of the plurality of input video streams; and
automatically including each constructed shot in the edited output video stream. - View Dependent Claims (17, 18, 19, 20)
-
Specification