Method and system for segmentation, classification, and summarization of video images

US 7,016,540 B1
Filed: 04/24/2000
Issued: 03/21/2006
Est. Priority Date: 11/24/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for summarizing a content of an input video sequence, said method comprising:

(a) computing a feature vector for each frame in a set of frames from said input video sequence;

(b) applying singular value decomposition to a matrix comprised of said feature vectors and projecting the matrix on a refined feature space representation, wherein positions of said projections on said refined feature space representation represent approximations of visual changes in said set of frames from said input video sequence;

(c) clustering said frames of said input video sequence based upon positions of said projections on said refined feature space representation;

(d) selecting a frame from each cluster to serve as a keyframe in a summarization of said input video sequence; and

(e) using said clustered frames to output a motion video representative of a summary of said input video sequence, wherein said input video sequence summary is composed according to a time-length parameter T_lenand a minimum display time parameter T_minby;

locating the video shot Θ

_iin each cluster S_ihaving the greatest length;

determining how the video shots in each cluster will be arranged according to C≦

N=T_len/T_min,wherein C represents a number of clusters; and

wherein N represents the maximum number of video shots;

if C≦

N, then all the video shots in each cluster is included in said input video sequence summary; and

if C≦

N, then sort each video shot Θ

_ifrom each cluster S_iin descending order by length, select the first N video shots for inclusion in said input video sequence summary and assign time length T_minto each selected video shot.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a technique for video segmentation, classification and summarization based on the singular value decomposition, frames of the input video sequence are represented by vectors composed of concatenated histograms descriptive of the spatial distributions of colors within the video frames. The singular value decomposition maps these vectors into a refined feature space. In the refined feature space produced by the singular value decomposition, the invention uses a metric to measure the amount of information contained in each video shot of the input video sequence. The most static video shot is defined as an information unit, and the content value computed from this shot is used as a threshold to cluster the remaining frames. The clustered frames are displayed using a set of static keyframes or a summary video sequence. The video segmentation technique relies on the distance between the frames in the refined feature space to calculate the similarity between frames in the input video sequence. The input video sequence is segmented based on the values of the calculated similarities. Finally, average video attribute values in each segment are used in classifying the segments.

Citations

30 Claims

1. A method for summarizing a content of an input video sequence, said method comprising:
- (a) computing a feature vector for each frame in a set of frames from said input video sequence;
  
  (b) applying singular value decomposition to a matrix comprised of said feature vectors and projecting the matrix on a refined feature space representation, wherein positions of said projections on said refined feature space representation represent approximations of visual changes in said set of frames from said input video sequence;
  
  (c) clustering said frames of said input video sequence based upon positions of said projections on said refined feature space representation;
  
  (d) selecting a frame from each cluster to serve as a keyframe in a summarization of said input video sequence; and
  
  (e) using said clustered frames to output a motion video representative of a summary of said input video sequence, wherein said input video sequence summary is composed according to a time-length parameter T_lenand a minimum display time parameter T_minby;
  
  locating the video shot Θ
  
  _iin each cluster S_ihaving the greatest length;
  
  determining how the video shots in each cluster will be arranged according to C≦
  
  N=T_len/T_min,wherein C represents a number of clusters; and
  
  wherein N represents the maximum number of video shots;
  
  if C≦
  
  N, then all the video shots in each cluster is included in said input video sequence summary; and
  
  if C≦
  
  N, then sort each video shot Θ
  
  _ifrom each cluster S_iin descending order by length, select the first N video shots for inclusion in said input video sequence summary and assign time length T_minto each selected video shot.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein said singular value decomposition is performed using frames selected with a fixed interval from said input video sequence.
  - 3. The method of claim 1, wherein each column of said matrix represents a frame in said refined feature space representation.
  - 4. The method of claim 1, wherein said feature vectors are computed using a color histogram that outputs a histogram vector.
  - 5. The method of claim 4, wherein said histogram vector is indicative of a spatial distribution of colors in said each of said frames.
  - 6. The method of claim 5, wherein each of said frames is divided into a plurality of blocks, each of said plurality of blocks being represented by a histogram in a color space indicative of a distribution of colors within each of said blocks.
  - 7. The method of claim 5, wherein each of said frames is divided into a plurality of blocks and said histogram vector comprises a plurality of histograms in a color space, each of said plurality of histograms corresponding to one of said plurality of blocks.
  - 8. The method of claim 1, wherein said selecting a frame comprises locating a frame with a feature vector that projects into a singular value that is most representative of other singular values of the cluster.
  - 9. The method of claim 1, wherein the composition of said input video sequence summary further comprises sorting the selected video shots by their respective time codes.
  - 10. The method of claim 9, wherein the composition of said input video sequence summary further comprises extracting a portion of selected video shot equal in length to time length T_minand inserting each extracted portion in order to said input video sequence summary.
  - 11. The method of claim 1, wherein said clustering of said frames further comprises using a position of the most static shot of said input video sequence to compute a value as a threshold during the clustering of said frames.
  - 12. The method of claim 11, wherein said clustering of said frames further comprises computing a content value and using said computed content value to cluster the remaining frames by:
    - sorting said feature vectors in said refined feature space representation in ascending order according to a distance of each of said feature vectors to an origin of said refined feature space representation;
      
      selecting a victor among said sorted feature vectors which is closest to an origin of said refined feature space representation and including said selected feature vector into a first cluster;
      
      clustering said plurality of sorted feature vectors in said refined feature space representation into a plurality of clusters according to a distance between each of said plurality of sorted feature vectors and feature vectors in each of said plurality of clusters and an amount of information in each of said plurality of clusters.
  - 13. The method of claim 12, wherein, in said clustering of sorted feature vectors, said plurality of sorted feature vectors are clustered into said plurality of clusters such that said amount of information in each of said plurality of clusters does not exceed an amount of information in said first cluster.
  - 14. The method of claim 12, wherein said first cluster is composed of frames based on a distance variation between said frames and an average distance between frames in said first cluster.
  - 15. The method of claim 12, wherein each of said plurality of clusters is composed of frames based on a distance variation between said frames and an average distance between frames in said each of said plurality of clusters.

16. A computer-readable medium containing a program for summarizing a content of an input video sequence, said program comprising:
- (a) computing a feature vector for each frame in a set of frames from said input video sequence;
  
  (b) applying singular value decomposition to a matrix comprised of said feature vectors and projecting the matrix on a refined feature space representation, wherein positions of said projections on said refined feature space representation represent approximations of visual changes in said set of frames from said input video sequence;
  
  (c) clustering said frames of said input video sequence based upon positions of said projections on said refined feature space representation;
  
  (d) selecting a frame from each cluster to serve as a keyframe in a summarization of said input video sequence; and
  
  (e) using said clustered frames to output a motion video representative of a summary of said input video sequence, wherein said input video sequence summary is composed according to a time-length parameter T_lenand a minimum display time parameter T_minby;
  
  locating the video shot Θ
  
  _iin each cluster S_ihaving the greatest length;
  
  determining how the video shots in each cluster will be arranged according to C≦
  
  N=T_len/T_min,wherein C represents a number of clusters; and
  
  wherein N represents the maximum number of video shots;
  
  if C≦
  
  N, then all the video shots in each cluster is included in said input video sequence summary; and
  
  if C>
  
  N, then sort each video shot Θ
  
  _ifrom each cluster S_iin descending order by length, select the first N video shots for inclusion in said input video sequence summary and assign time length T_minto each selected video shot.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 17. The computer-readable medium of claim 16, wherein said singular value decomposition is performed using frames selected with a fixed interval from said input video sequence.
  - 18. The computer-readable medium of claim 16, wherein each column of said matrix represents a frame in said refined feature space representation.
  - 19. The computer-readable medium of claim 16, wherein said feature vectors are computed using a color histogram that outputs a histogram vector.
  - 20. The computer-readable medium of claim 19, wherein said histogram vector is indicative of a spatial distribution of colors in said each of said frames.
  - 21. The computer-readable medium of claim 20, wherein each of said frames is divided into a plurality of blocks, each of said plurality of blocks being represented by a histogram in a color space indicative of a distribution of colors within each of said blocks.
  - 22. The computer-readable medium of claim 20, wherein each of said frames is divided into a plurality of blocks and said histogram vector comprises a plurality of histograms in a color space, each of said plurality of histograms corresponding to one of said plurality of blocks.
  - 23. The computer-readable medium of claim 16, wherein said selecting a frame comprises locating a frame with a feature vector that projects into a singular value that is most representative of other singular values of the cluster.
  - 24. The computer-readable medium of claim 16, wherein the composition of said input video sequence summary further comprises sorting the selected video shots by their respective time codes.
  - 25. The computer-readable medium of claim 24, wherein the composition of said input video sequence summary further comprises extracting a portion of selected video shot equal in length to time length T_minand inserting each extracted portion in order to said input video sequence summary.
  - 26. The computer-readable medium of claim 16, wherein said clustering of said frames further comprises using a position of the most static shot of said input video sequence to compute a value as a threshold during the clustering of said frames.
  - 27. The computer-readable medium of claim 25, wherein said clustering of said frames further comprises computing a content value and using said computed content value to cluster the remaining frames by:
    - sorting said feature vectors in said refined feature space representation in ascending order according to a distance of each of said feature vectors to an origin of said refined feature space representation;
      
      selecting a vector among said sorted feature vectors which is closest to an origin of said refined feature space representation and including said selected feature vector into a first cluster;
      
      clustering said plurality of sorted feature vectors in said refined feature space representation into a plurality of clusters according to a distance between each of said plurality of sorted feature vectors and feature vectors in each of said plurality of clusters and an amount of information in each of said plurality of clusters.
  - 28. The computer-readable medium of claim 27, wherein, in said clustering of sorted feature vectors, said plurality of sorted feature vectors are clustered into said plurality of clusters such that said amount of information in each of said plurality of clusters does not exceed an amount of information in said first cluster.
  - 29. The computer-readable medium of claim 27, wherein said first cluster is composed of frames based on a distance variation between said frames and an average distance between frames in said first cluster.
  - 30. The computer-readable medium of claim 27, wherein each of said plurality of clusters is composed of frames based on a distance variation between said frames and an average distance between frames in said each of said plurality of clusters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Gong, Yihong, Liu, Xin
Primary Examiner(s)
Wu, Jingge
Assistant Examiner(s)
MACKOWEY, ANTHONY M

Application Number

US09/556,349
Time in Patent Office

2,157 Days
Field of Search

382/168, 382/224, 382/225, 382/236, 382/305, 345/723, 348/700, 348/703, 707/3
US Class Current

382/225
CPC Class Codes

G06F 16/739   in form of a video summary,...

G06F 16/785   using colour or luminescence

G06V 20/40   in video content extracting...

G11B 27/031   Electronic editing of digit...

H04N 5/147   Scene change detection

Y10S 707/99933   Query processing, i.e. sear...

Method and system for segmentation, classification, and summarization of video images

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for segmentation, classification, and summarization of video images

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links