System and method for automatic video editing using object recognition

US 20060251382A1
Filed: 05/09/2005
Published: 11/09/2006
Est. Priority Date: 05/09/2005
Status: Abandoned Application

First Claim

Patent Images

1. An automated video editing system for creating an edited output video stream from one or more input video streams, comprising using a computing device for:

receiving one or more input video streams;

automatically partitioning each input video stream into one or more scenes;

identifying a list of possible candidate shots for each scene;

parsing each scene to derive information of interest relating to objects detected within each scene;

selecting a best shot from the list of possible candidate shots for each scene as a function of the information derived via parsing of each scene;

constructing the selected best shot from the corresponding scenes from the input video streams; and

outputting the constructed shot for each scene as the edited output video.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An “automated video editor” (AVE) automatically processes one or more input videos to create an edited video stream with little or no user interaction. The AVE produces cinematic effects such as cross-cuts, zooms, pans, insets, 3-D effects, etc., by applying a combination of cinematic rules, object recognition techniques, and digital editing of the input video. Consequently, the AVE is capable of using a simple video taken with a fixed camera to automatically simulate cinematic editing effects that would normally require multiple cameras and/or professional editing. The AVE first defines a list of scenes in the video and generates a rank-ordered list of candidate shots for each scene. Each frame of each scene is then analyzed or “parsed” using object detection techniques (“detectors”) for isolating unique objects (faces, moving/stationary objects, etc.) in the scene. Shots are then automatically selected for each scene and used to construct the edited video stream.

74 Citations

View as Search Results

20 Claims

1. An automated video editing system for creating an edited output video stream from one or more input video streams, comprising using a computing device for:
- receiving one or more input video streams;
  
  automatically partitioning each input video stream into one or more scenes;
  
  identifying a list of possible candidate shots for each scene;
  
  parsing each scene to derive information of interest relating to objects detected within each scene;
  
  selecting a best shot from the list of possible candidate shots for each scene as a function of the information derived via parsing of each scene;
  
  constructing the selected best shot from the corresponding scenes from the input video streams; and
  
  outputting the constructed shot for each scene as the edited output video.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The automated video editing system of claim 1 wherein one or more of the input video streams is provided via real-time capture of a live video recording.
  - 3. The automated video editing system of claim 2 wherein outputting the constructed shot for each scene is accomplished in real-time within a maximum delay on the order of about one video frame given a current video frame rate.
  - 4. The automated video editing system of claim 1 wherein the list of possible candidate shots for each scene is predefined as part of a user selectable template corresponding to each scene.
  - 5. The automated video editing system of claim 1 wherein parsing each scene to derive information of interest relating to objects detected within each scene comprises analyzing each frame of each scene using one or more object detectors for bounding positions of each detected object within each frame of each scene.
  - 6. The automated video editing system of claim 1, further comprising a user interface for manually identifying one or more scenes in one or more of the input video streams.
  - 7. The automated video editing system of claim 1, further comprising a user interface for manually selecting the best shot for one or more of the scenes.
  - 8. A computer-readable medium having computer-executable instructions for implementing the automated video editing system of claim 1.

12. A method for automatically generating edited output video streams from one or more input video streams, comprising using a computing device to perform the following steps:
- a receiving step for receiving one or more input video streams;
  
  a scene detection step for analyzing each video stream to identify individual scenes in each video stream;
  
  a scene analysis step for analyzing each scene to identify one or more possible candidate shots that can be constructed from the detected scenes;
  
  a scene parsing step for examining each scene to identify available information within each scene;
  
  a shot selection step for selecting a best shot from the candidate shots as a function of the information identified via parsing of each scene;
  
  a video construction step for constructing the selected best shot from one or more corresponding scenes; and
  
  a video output step for outputting the constructed shot for inclusion in the edited output video.
- View Dependent Claims (13, 14, 15)
- - 13. The method of claim 12 wherein the parsing step further comprises steps for using one or more object detectors for locating and identifying detected objects within each frame of each scene.
  - 14. The method of claim 12 wherein the shot selection step further comprises a cinematic rule evaluation step for evaluating a set of predefined cinematic rules used in selecting the best shot.
  - 15. The method of claim 12 wherein the step for identifying possible candidate shots is constrained by a user selectable shot template which defines a set of allowable candidate shots.

16. A computer-readable medium having computer executable instructions for automatically generating an edited output video stream, said computer executable instructions comprising:
- examining a plurality of input video streams to identify each of a plurality of individual scenes in each input video stream;
  
  identifying a set of possible candidate shots for each scene as a function of a user selectable template which defines allowable candidate shots for the user selected template;
  
  examining content of each scene using a set of one or more object detectors to derive information pertaining to one or more objects detected within one or more frames of each scene;
  
  selecting a best shot from the set of possible candidate shots for each scene as a function of the information derived from the one or more detected objects of each scene, said best shot selection being further constrained by a set of one or more cinematic rules;
  
  constructing the selected best shot for each scene from the corresponding scenes of the plurality of input video streams; and
  
  automatically including each constructed shot in the edited output video stream.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-readable medium of claim 16 wherein the information derived from the one or more object detectors includes any one or more of:
    - positions of each detected object;
      
      identification of each detected object;
      
      speaker identification; and
      
      speaker tracking.
  - 18. The computer-readable medium of claim 16 wherein constructing the selected best shot for each scene from the corresponding scenes of the plurality of input video streams includes segmenting portions of one or more of the frames of the corresponding scenes and applying one or more of:
    - digital video cropping, overlays, insets, and digital zooms, to construct the selected best shot.
  - 19. The computer-readable medium of claim 16 wherein the cinematic rules define shot criteria including one or more of:
    - a desired frequency for particular shot types, avoidance of shot repetition, and desired shot sequence.
  - 20. The computer-readable medium of claim 16 further comprising a user interface for manually selecting the best shot from the set of possible candidate shots for one or more of the scenes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhang, Dongmei, Wang, Shuo, Vronay, David, Zhang, Weiwei

Application Number

US11/125,384
Publication Number

US 20060251382A1
Time in Patent Office

Days
Field of Search
US Class Current

386/242
CPC Class Codes

G11B 27/034   on discs G11B27/036, G11B27...

H04N 7/147   Communication arrangements,...

H04N 7/15   Conference systems

System and method for automatic video editing using object recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

74 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for automatic video editing using object recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

74 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links