System for automatic video segmentation and key frame extraction for video sequences having both sharp and gradual transitions

US 5,635,982 A
Filed: 06/27/1994
Issued: 06/03/1997
Est. Priority Date: 06/27/1994
Status: Expired due to Fees

First Claim

Patent Images

1. In a system for parsing a plurality of images in motion without modifying a media in which the images are recorded originally, said images being further divided into plurality sequences of frames, a method for selecting at least one key frame representative of a sequence of said images comprising the steps of:

(a) determining a difference metric or a set of difference metrics between consecutive image frames, said difference metrics having corresponding thresholds for selected image features;

(b) deriving a content difference (Di), said D_i being a difference between two current image frames based on said selected image features and said difference metrics, an interval between said two current image frames being adjustable with a skip factor S which defines a resolution at which said image frames are being analyzed;

(c) accumulating D_i between every two said consecutive frames until a sum thereof exceeds a predetermined potential key frame threshold T_k ;

(d) calculating a difference D_a, said D_a being a difference between the current frame and the previous key frame based on said difference metrics, or between the current frame and the first frame of said sequence based also on said difference metric if there is no previous key frame, the current frame becoming the key frame if D_a exceeds a predetermined key frame threshold T_d ; and

(e) repeating the steps (a) to (d) until the end frame is reached,whereby key frames for indexing sequences of image are identified and captured automatically.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic video content parser for parsing video shots such that they are represented in their native media and retrievable based on their visual contents. This system provides methods for temporal segmentation of video sequences into individual camera shots using a novel twin-comparison method. The method is capable of detecting both camera shots implemented by sharp break and gradual transitions implemented by special editing techniques, including dissolve, wipe, fade-in and fade-out; and content-based key frame selection of individual shots by analyzing the temporal variation of video content and selecting a key frame once the difference of content between the current frame and a preceding selected key frame exceeds a set of preselected thresholds.

Citations

29 Claims

1. In a system for parsing a plurality of images in motion without modifying a media in which the images are recorded originally, said images being further divided into plurality sequences of frames, a method for selecting at least one key frame representative of a sequence of said images comprising the steps of:
- (a) determining a difference metric or a set of difference metrics between consecutive image frames, said difference metrics having corresponding thresholds for selected image features;
  
  (b) deriving a content difference (Di), said D_i being a difference between two current image frames based on said selected image features and said difference metrics, an interval between said two current image frames being adjustable with a skip factor S which defines a resolution at which said image frames are being analyzed;
  
  (c) accumulating D_i between every two said consecutive frames until a sum thereof exceeds a predetermined potential key frame threshold T_k ;
  
  (d) calculating a difference D_a, said D_a being a difference between the current frame and the previous key frame based on said difference metrics, or between the current frame and the first frame of said sequence based also on said difference metric if there is no previous key frame, the current frame becoming the key frame if D_a exceeds a predetermined key frame threshold T_d ; and
  
  (e) repeating the steps (a) to (d) until the end frame is reached,whereby key frames for indexing sequences of image are identified and captured automatically.
- View Dependent Claims (2, 3)
- - 2. The key frame selection method in claim 1 wherein said potential key threshold T_k is selected to be equal to or larger than a shot break threshold T_b.
  - 3. The key frame selection method in claim 1 wherein said key frame threshold T_d is selected to be proportional to but smaller than a shot break threshold T_b.

4. In a system for parsing a plurality of images in motion without modifying a media in which the images are recorded originally, said images being further divided into plurality of sequences of frames, a method for segmenting at least one sequence of said images into individual camera shots, said method comprising the steps of:
- (a) determining a difference metric or a set of difference metrics between consecutive image frames, said difference metrics having corresponding shot break thresholds T_b for selected image features;
  
  (b) deriving a content difference D_i, said D_i being a difference between two current image frames based on said selected image features and said difference metrics, an interval between said two current image frames being adjustable with a skip factor S which defines a resolution at which said image frames are being analyzed;
  
  (c) declaring a sharp cut if D_i exceeds said threshold T_b ;
  
  (d) detecting a starting frame of a potential transition if said D_i exceeds a transition threshold T_t but is less than said shot break threshold T_b ;
  
  (e) detecting an end frame of a potential transition by verifying that D_a >
  
  T_b or Σ
  
  _ta /Σ
  
  _tF >
  
  γ
  
  T_t is true; and
  
  (f) continuing steps (a) through (e) until the end frame is reached,whereby sequence of images having individual camera shots are identified and segmented automatically in at least one pass.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17)
- - 5. The method for video segmentation as in claim 4 wherein said shot break threshold T_b is a sum of a mean of the frame-to-frame difference μ
    - and a multiple a of a standard deviation of a frame-to-frame difference σ
      
      .
  - 6. The method for video segmentation as in claim 5 wherein said a has a value between 5 and 6 when the difference metric is a histrogram comparison.
  - 7. The method for video segmentation as in claim 4 wherein said transition threshold T_t a multiple b of said shot break threshold T_b.
  - 8. The method for video segmentation as in claim 7 wherein said b has a value between 0.1 and 0.5.
  - 9. The method for video segmentation as in claim 4 wherein the ending of transition is confirmed if:
    - (a) D_i is less than the transition threshold T_t ; and
      
      (b) the accumulated D_i between the starting frame of a potential transition and the current frame exceeds the threshold T_b or the average D_i between two consecutive frames in the transition is higher than a multiple β
      
      of the threshold T_t.
  - 10. The method for video segmentation as in claim 9 wherein said β
    - is larger than 1.0.
  - 11. The method for video segmentation as in claim 9 wherein the ending of transition is confirmed notwithstanding a failure to comply with steps (a) and (b) if a number of consecutive frames between every two of D_i is lower than the threshold T_t consistently, said number of consecutive frames being allowed in a transition up to a user-turnable tolerance before the potential transition is determined as false and is discarded.
  - 12. The method for video segmentation as in claim 11 wherein the user-tunable tolerance is larger than three frames if the frame sequence is sampled with skip factor S=1, N_tmmax being greater than or equal to 3.
  - 13. The method for video segmentation as in claim 4 wherein the starting of frame of a potential transition is set only if a number of frames of the current shot is larger than a default number N_smin ;
    - said default number N_smin typically being greater than or equal to 5.
  - 14. The method of video segmentation as in claim 4 wherein a processing speed of steps (a) through (f) is enhanced, said method comprising at least two steps of:
    - (a) choosing in a first pass a skip factor S larger than 2 to temporarily decrease resolution, so as to identify rapidly a location of potential segment boundaries without allowing any real boundaries to pass through without being detected; and
      
      (b) increasing resolution and restricting all computation to a vicinity of said potential segment boundaries whereby both camera breaks and gradual transitions are further identified, said step (b) being applied to subsequent passes.
  - 15. The method for video segmentation as in claim 14 wherein said method employs different difference metrics in different passes.
  - 17. The method for video segmentation as in claim 14 wherein said increasing step further includes:
    - (a) determining a difference metric or a set of difference metrics between consecutive image frames, said difference metrics having corresponding shot break thresholds T_b for selected image features;
      
      (b) deriving a content difference D_i, said D_i being a difference between two current image frames based on said selected image features and said difference metrics, an interval between said two current image frames being adjustable with a skip factor S which defines a resolution at which said image frames are being analyzed;
      
      (c) declaring a sharp cut if D_i exceeds said threshold T_b ;
      
      (d) detecting a starting frame of a potential transition if said D_i exceeds a transition threshold T_t but less than said shot break threshold T_b ;
      
      (e) detecting an ending frame of a potential transition by verifying an accumulated difference, said accumulated difference being based on said selected difference metrics; and
      
      (f) continuing steps (a) through (e) until the end frame is reached, whereby said steps (a) through (f) are applied for said subsequent passes.

16. A speed-enhanced multi-pass method for segmenting at least one sequence of images into individual camera shots in a system for parsing a plurality of said images in motion without modifying a media in which said images are recorded originally, said images being further divided into plurality of sequences of frames, said method comprising the steps of:
- (a) determining a difference metric or a set of difference metric between consecutive image frames, said difference metrics having corresponding shot break thresholds T_b for selected image features;
  
  (b) deriving a content difference D_i, said D_i being a difference between two current image frames based on said selected image features and said difference metrics, an interval between said two current image frames being adjustable with a skip factor S which defines a resolution at which said image frames are being analyzed;
  
  (c) declaring a sharp cut if D_i exceeds said threshold T_b ; and
  
  (d) continuing steps (a) through (c) until the end frame is reached,whereby in a first pass, resolution is temporarily decreased by choosing a skip factor S larger than 2, so as to identify rapidly a location of potential segment boundaries without allowing any real boundaries to pass through without being detected, and in subsequent passes, resolution is increased and all computation is restricted to a vicinity of said potential segment boundaries whereby both camera breaks and gradual transitions are further identified.

18. In a system for parsing a plurality of images in motion without modifying the media in which the images are recorded originally, said images being further divided into plurality of sequences of frames, a method for segmenting at least one sequence of said images into individual camera shots and selecting at least one key frame representative of a sequence of said images, said method comprising the steps of:
- (a) determining a difference metric or a set of difference metrics between consecutive image frames, said difference metrics having corresponding shot break thresholds T_b for selected image features;
  
  (b) deriving a content difference D_i, said D_i being the difference between two current image frames based on said selected image features and said difference metrics, the interval between said two current image frames being adjustable with a skip factor S which defines the resolution at which said image frames are being analyzed;
  
  (c) declaring a sharp cut if D_i exceeds said threshold T_b ;
  
  (d) detecting the starting frame of a potential transition if said D_i exceeds a transition threshold T_t but less than said shot break threshold T_b ;
  
  (e) detecting the ending frame of a potential transition by verifying an accumulated difference, said accumulated difference being based on said selected difference metrics;
  
  (f) accumulating D_i between every two said consecutive frames until a sum thereof exceeds a predetermined potential key frame threshold T_k; a(g) calculating a difference Da, said Da being a difference between the current frame and the previous key frame based on said difference metric, or between the current frame and the first frame of said sequence based also on said difference metric if there is no previous key frame, the current frame becoming the key frame if Da exceeds a predetermined key frame threshold T_d ; and
  
  (h) continuing the steps (a) through (g) until the end frame is reached,whereby sequence of images having individual camera shots are identified and segmented automatically and key frames for indexing sequences of image are identified and captured in at least one pass.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 19. The method for video segmentation as in claim 18 wherein said shot break threshold T_b is a sum of a mean of a frame-to-frame difference μ
    - and a multiple a of a standard deviation of the frame-to-frame difference σ
      
      .
  - 20. The method for video segmentation as in claim 19 wherein said a has a value between 5 and 6 when the difference metric is a histrogram comparison.
  - 21. The method for video segmentation as in claim 18 wherein said transition threshold T_t is a multiple b of said shot break threshold T_b.
  - 22. The method for video segmentation as in claim 21 wherein said b has a value between 0.1 and 0.5.
  - 23. The method for video segmentation as in claim 18 wherein the ending of transition is confirmed if:
    - (a) D_i is less than the transition threshold T_t ; and
      
      (b) the accumulated D_i between the starting frame of a potential transition and the current frame exceeds the threshold T_b or the average D_i between two consecutive frames in the transition is higher than a multiple β
      
      of the threshold T_t.
  - 24. The method for video segmentation as in claim 23 wherein said β
    - is larger than 1.0.
  - 25. The method for video segmentation as in claim 23 wherein the ending of transition is confirmed notwithstanding a failure to comply with steps (a) and (b) if a number of consecutive frames between every two of D_i is lower than the threshold T_t consistently, said number of consecutive frames being allowed in a transition up to a user-tunable tolerance before the potential transition is determined as false and is discarded.
  - 26. The method for video segmentation as in claim 25 wherein the user-tunable tolerance is larger than three frames if the frame sequence is sampled with skip factor S=1, N_tmmax being greater than or equal to 3.
  - 27. The method for video segmentation as in claim 18 wherein the starting of frame of a potential transition is set only if a number of frames of the current shot is larger than a default number;
    - said default number typically being 5 frames, and N_smin being greater than or equal to 5.
  - 28. The key frame selection method in claim 18 wherein said potential key threshold T_k is selected to be equal to or larger than the shot break threshold T_b.
  - 29. The key frame selection method in claim 18 wherein said key frame threshold T_d is selected to be proportional to but smaller than the shot break threshold T_b.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kent Ridge Digital Labs
Original Assignee
Institute of Systemes Science, National University of Signapore
Inventors
Smoliar, Stephen W., Zhang, Hong J., Wu, Jian H.
Primary Examiner(s)
Kostak, Victor R.
Assistant Examiner(s)
Miller, John W.

Application Number

US08/266,216
Time in Patent Office

1,072 Days
Field of Search

348/422, 348/231, 364/419.08, 358/906, 358/909.1, 382/171, 382/173
US Class Current

348/231.99
CPC Class Codes

G06F 16/71   Indexing; Data structures t...

G06F 16/785   using colour or luminescence

G06V 20/40   in video content extracting...

G11B 27/28   by using information signal...

H04N 5/147   Scene change detection

System for automatic video segmentation and key frame extraction for video sequences having both sharp and gradual transitions

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

System for automatic video segmentation and key frame extraction for video sequences having both sharp and gradual transitions

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links