System for automatic video segmentation and key frame extraction for video sequences having both sharp and gradual transitions
First Claim
1. In a system for parsing a plurality of images in motion without modifying a media in which the images are recorded originally, said images being further divided into plurality sequences of frames, a method for selecting at least one key frame representative of a sequence of said images comprising the steps of:
- (a) determining a difference metric or a set of difference metrics between consecutive image frames, said difference metrics having corresponding thresholds for selected image features;
(b) deriving a content difference (Di), said Di being a difference between two current image frames based on said selected image features and said difference metrics, an interval between said two current image frames being adjustable with a skip factor S which defines a resolution at which said image frames are being analyzed;
(c) accumulating Di between every two said consecutive frames until a sum thereof exceeds a predetermined potential key frame threshold Tk ;
(d) calculating a difference Da, said Da being a difference between the current frame and the previous key frame based on said difference metrics, or between the current frame and the first frame of said sequence based also on said difference metric if there is no previous key frame, the current frame becoming the key frame if Da exceeds a predetermined key frame threshold Td ; and
(e) repeating the steps (a) to (d) until the end frame is reached,whereby key frames for indexing sequences of image are identified and captured automatically.
2 Assignments
0 Petitions
Accused Products
Abstract
An automatic video content parser for parsing video shots such that they are represented in their native media and retrievable based on their visual contents. This system provides methods for temporal segmentation of video sequences into individual camera shots using a novel twin-comparison method. The method is capable of detecting both camera shots implemented by sharp break and gradual transitions implemented by special editing techniques, including dissolve, wipe, fade-in and fade-out; and content-based key frame selection of individual shots by analyzing the temporal variation of video content and selecting a key frame once the difference of content between the current frame and a preceding selected key frame exceeds a set of preselected thresholds.
-
Citations
29 Claims
-
1. In a system for parsing a plurality of images in motion without modifying a media in which the images are recorded originally, said images being further divided into plurality sequences of frames, a method for selecting at least one key frame representative of a sequence of said images comprising the steps of:
-
(a) determining a difference metric or a set of difference metrics between consecutive image frames, said difference metrics having corresponding thresholds for selected image features; (b) deriving a content difference (Di), said Di being a difference between two current image frames based on said selected image features and said difference metrics, an interval between said two current image frames being adjustable with a skip factor S which defines a resolution at which said image frames are being analyzed; (c) accumulating Di between every two said consecutive frames until a sum thereof exceeds a predetermined potential key frame threshold Tk ; (d) calculating a difference Da, said Da being a difference between the current frame and the previous key frame based on said difference metrics, or between the current frame and the first frame of said sequence based also on said difference metric if there is no previous key frame, the current frame becoming the key frame if Da exceeds a predetermined key frame threshold Td ; and (e) repeating the steps (a) to (d) until the end frame is reached, whereby key frames for indexing sequences of image are identified and captured automatically. - View Dependent Claims (2, 3)
-
-
4. In a system for parsing a plurality of images in motion without modifying a media in which the images are recorded originally, said images being further divided into plurality of sequences of frames, a method for segmenting at least one sequence of said images into individual camera shots, said method comprising the steps of:
-
(a) determining a difference metric or a set of difference metrics between consecutive image frames, said difference metrics having corresponding shot break thresholds Tb for selected image features; (b) deriving a content difference Di, said Di being a difference between two current image frames based on said selected image features and said difference metrics, an interval between said two current image frames being adjustable with a skip factor S which defines a resolution at which said image frames are being analyzed; (c) declaring a sharp cut if Di exceeds said threshold Tb ; (d) detecting a starting frame of a potential transition if said Di exceeds a transition threshold Tt but is less than said shot break threshold Tb ; (e) detecting an end frame of a potential transition by verifying that Da >
Tb or Σ
ta /Σ
tF >
γ
Tt is true; and(f) continuing steps (a) through (e) until the end frame is reached, whereby sequence of images having individual camera shots are identified and segmented automatically in at least one pass. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17)
-
-
16. A speed-enhanced multi-pass method for segmenting at least one sequence of images into individual camera shots in a system for parsing a plurality of said images in motion without modifying a media in which said images are recorded originally, said images being further divided into plurality of sequences of frames, said method comprising the steps of:
-
(a) determining a difference metric or a set of difference metric between consecutive image frames, said difference metrics having corresponding shot break thresholds Tb for selected image features; (b) deriving a content difference Di, said Di being a difference between two current image frames based on said selected image features and said difference metrics, an interval between said two current image frames being adjustable with a skip factor S which defines a resolution at which said image frames are being analyzed; (c) declaring a sharp cut if Di exceeds said threshold Tb ; and (d) continuing steps (a) through (c) until the end frame is reached, whereby in a first pass, resolution is temporarily decreased by choosing a skip factor S larger than 2, so as to identify rapidly a location of potential segment boundaries without allowing any real boundaries to pass through without being detected, and in subsequent passes, resolution is increased and all computation is restricted to a vicinity of said potential segment boundaries whereby both camera breaks and gradual transitions are further identified.
-
-
18. In a system for parsing a plurality of images in motion without modifying the media in which the images are recorded originally, said images being further divided into plurality of sequences of frames, a method for segmenting at least one sequence of said images into individual camera shots and selecting at least one key frame representative of a sequence of said images, said method comprising the steps of:
-
(a) determining a difference metric or a set of difference metrics between consecutive image frames, said difference metrics having corresponding shot break thresholds Tb for selected image features; (b) deriving a content difference Di, said Di being the difference between two current image frames based on said selected image features and said difference metrics, the interval between said two current image frames being adjustable with a skip factor S which defines the resolution at which said image frames are being analyzed; (c) declaring a sharp cut if Di exceeds said threshold Tb ; (d) detecting the starting frame of a potential transition if said Di exceeds a transition threshold Tt but less than said shot break threshold Tb ; (e) detecting the ending frame of a potential transition by verifying an accumulated difference, said accumulated difference being based on said selected difference metrics; (f) accumulating Di between every two said consecutive frames until a sum thereof exceeds a predetermined potential key frame threshold Tk;
a(g) calculating a difference Da, said Da being a difference between the current frame and the previous key frame based on said difference metric, or between the current frame and the first frame of said sequence based also on said difference metric if there is no previous key frame, the current frame becoming the key frame if Da exceeds a predetermined key frame threshold Td ; and (h) continuing the steps (a) through (g) until the end frame is reached, whereby sequence of images having individual camera shots are identified and segmented automatically and key frames for indexing sequences of image are identified and captured in at least one pass. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification