Method and system for segmentation, classification, and summarization of video images
First Claim
1. A method for summarizing a content of an input video sequence, said method comprising:
- (a) computing a feature vector for each frame in a set of frames from said input video sequence;
(b) applying singular value decomposition to a matrix comprised of said feature vectors and projecting the matrix on a refined feature space representation, wherein positions of said projections on said refined feature space representation represent approximations of visual changes in said set of frames from said input video sequence;
(c) clustering said frames of said input video sequence based upon positions of said projections on said refined feature space representation;
(d) selecting a frame from each cluster to serve as a keyframe in a summarization of said input video sequence; and
(e) using said clustered frames to output a motion video representative of a summary of said input video sequence, wherein said input video sequence summary is composed according to a time-length parameter Tlen and a minimum display time parameter Tmin by;
locating the video shot Θ
i in each cluster Si having the greatest length;
determining how the video shots in each cluster will be arranged according to C≦
N=Tlen/Tmin,wherein C represents a number of clusters; and
wherein N represents the maximum number of video shots;
if C≦
N, then all the video shots in each cluster is included in said input video sequence summary; and
if C≦
N, then sort each video shot Θ
i from each cluster Si in descending order by length, select the first N video shots for inclusion in said input video sequence summary and assign time length Tmin to each selected video shot.
2 Assignments
0 Petitions
Accused Products
Abstract
In a technique for video segmentation, classification and summarization based on the singular value decomposition, frames of the input video sequence are represented by vectors composed of concatenated histograms descriptive of the spatial distributions of colors within the video frames. The singular value decomposition maps these vectors into a refined feature space. In the refined feature space produced by the singular value decomposition, the invention uses a metric to measure the amount of information contained in each video shot of the input video sequence. The most static video shot is defined as an information unit, and the content value computed from this shot is used as a threshold to cluster the remaining frames. The clustered frames are displayed using a set of static keyframes or a summary video sequence. The video segmentation technique relies on the distance between the frames in the refined feature space to calculate the similarity between frames in the input video sequence. The input video sequence is segmented based on the values of the calculated similarities. Finally, average video attribute values in each segment are used in classifying the segments.
-
Citations
30 Claims
-
1. A method for summarizing a content of an input video sequence, said method comprising:
-
(a) computing a feature vector for each frame in a set of frames from said input video sequence; (b) applying singular value decomposition to a matrix comprised of said feature vectors and projecting the matrix on a refined feature space representation, wherein positions of said projections on said refined feature space representation represent approximations of visual changes in said set of frames from said input video sequence; (c) clustering said frames of said input video sequence based upon positions of said projections on said refined feature space representation; (d) selecting a frame from each cluster to serve as a keyframe in a summarization of said input video sequence; and (e) using said clustered frames to output a motion video representative of a summary of said input video sequence, wherein said input video sequence summary is composed according to a time-length parameter Tlen and a minimum display time parameter Tmin by; locating the video shot Θ
i in each cluster Si having the greatest length;determining how the video shots in each cluster will be arranged according to C≦
N=Tlen/Tmin,wherein C represents a number of clusters; and wherein N represents the maximum number of video shots; if C≦
N, then all the video shots in each cluster is included in said input video sequence summary; andif C≦
N, then sort each video shot Θ
i from each cluster Si in descending order by length, select the first N video shots for inclusion in said input video sequence summary and assign time length Tmin to each selected video shot. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-readable medium containing a program for summarizing a content of an input video sequence, said program comprising:
-
(a) computing a feature vector for each frame in a set of frames from said input video sequence; (b) applying singular value decomposition to a matrix comprised of said feature vectors and projecting the matrix on a refined feature space representation, wherein positions of said projections on said refined feature space representation represent approximations of visual changes in said set of frames from said input video sequence; (c) clustering said frames of said input video sequence based upon positions of said projections on said refined feature space representation; (d) selecting a frame from each cluster to serve as a keyframe in a summarization of said input video sequence; and (e) using said clustered frames to output a motion video representative of a summary of said input video sequence, wherein said input video sequence summary is composed according to a time-length parameter Tlen and a minimum display time parameter Tmin by; locating the video shot Θ
i in each cluster Si having the greatest length;determining how the video shots in each cluster will be arranged according to C≦
N=Tlen/Tmin,wherein C represents a number of clusters; and wherein N represents the maximum number of video shots; if C≦
N, then all the video shots in each cluster is included in said input video sequence summary; andif C>
N, then sort each video shot Θ
i from each cluster Si in descending order by length, select the first N video shots for inclusion in said input video sequence summary and assign time length Tmin to each selected video shot. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification