Method and apparatus for automatically summarizing video
First Claim
Patent Images
1. A method for automatically producing a summary of a video, comprising:
- in a computer system, performing the operations of;
receiving the video;
partitioning the video into scenes comprising a plurality of frames, whereinpartitioning the video into scenes comprises;
extracting feature vectors for sampled frames in the video;
detecting shot boundaries based on distances between feature vectors for successive sampled framesproducing a frame-similarity matrix, wherein each element in the frame-similarity matrix represents a distance between feature vectors for a corresponding pair of sampled frames;
using the frame-similarity matrix and the detected shot boundaries to compute a shot-similarity matrix, wherein each element in the shot-similarity matrix represents a similarity between a corresponding pair of shots; and
determining scene boundaries by selectively merging successive shots together based on the computed similarities between the successive shots;
generating a scene-similarity matrix based on the frame-similarity matrix and the determined scene boundaries, each element of the scene-similarity matrix representing a measure of similarity between different scenes of the video, the scene-similarity matrix comprising a plurality of elementsdetermining an importance score for each scene based on the scene-similarity matrix and a distance from the scene to an average scene of the video, the importance score for a scene indicating a relative importance of the scene and wherein the importance score is increased responsive to the scene having a high similarity with other scenes in the video selecting representative scenes from the video based on the determined importance scores; and
combining selected scenes to produce the summary for the video.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system that automatically produces a summary of a video. During operation, the system partitions the video into scenes and then determines similarities between the scenes. Next, the system selects representative scenes from the video based on the determined similarities, and combines the selected scenes to produce the summary for the video.
11 Citations
14 Claims
-
1. A method for automatically producing a summary of a video, comprising:
-
in a computer system, performing the operations of; receiving the video; partitioning the video into scenes comprising a plurality of frames, wherein partitioning the video into scenes comprises; extracting feature vectors for sampled frames in the video; detecting shot boundaries based on distances between feature vectors for successive sampled frames producing a frame-similarity matrix, wherein each element in the frame-similarity matrix represents a distance between feature vectors for a corresponding pair of sampled frames; using the frame-similarity matrix and the detected shot boundaries to compute a shot-similarity matrix, wherein each element in the shot-similarity matrix represents a similarity between a corresponding pair of shots; and determining scene boundaries by selectively merging successive shots together based on the computed similarities between the successive shots; generating a scene-similarity matrix based on the frame-similarity matrix and the determined scene boundaries, each element of the scene-similarity matrix representing a measure of similarity between different scenes of the video, the scene-similarity matrix comprising a plurality of elements determining an importance score for each scene based on the scene-similarity matrix and a distance from the scene to an average scene of the video, the importance score for a scene indicating a relative importance of the scene and wherein the importance score is increased responsive to the scene having a high similarity with other scenes in the video selecting representative scenes from the video based on the determined importance scores; and combining selected scenes to produce the summary for the video. - View Dependent Claims (2, 3, 4, 5, 6, 14)
-
-
7. A non-transitory computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a method for automatically producing a summary of a video, the method comprising:
-
receiving the video at the computer; partitioning the video into scenes comprising a plurality of frames, wherein partitioning the view into scenes comprise; extracting feature vectors for sampled frames in the video; detecting shot boundaries based on distances between feature vectors for successive sampled frames; producing a frame-similarity matrix, wherein each element in the frame-similarity matrix represents a distance between feature vectors for a corresponding pair of sampled frames; using the frame-similarity matrix and the detected shot boundaries to compute a shot-similarity matrix, wherein each element in the shot-similarity matrix represents a similarity between a corresponding pair of shots; and determining scene boundaries by selectively merging successive shots together based on the computed similarities between the successive shots; generating a scene-similarity matrix based on the frame-similarity matrix and the determined scene boundaries, each element of the scene-similarity matrix representing a measure of similarity between different scenes of the video, the scene-similarity matrix comprising a plurality of elements; determining an importance score for each scene based on the scene-similarity matrix and a distance from the scene to an average scene of the video, the importance score for a scene indicating a relative importance of the scene and wherein the importance score is increased responsive to the scene having a high similarity with other scenes in the video; selecting representative scenes from the video based on the determined importance scores; and combining selected scenes to produce the summary for the video. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An apparatus that automatically produces a summary of a video, comprising:
-
a non-transitory computer readable storage medium storing executable instructions that perform steps comprising; partitioning the video into scenes comprising a plurality of frames, wherein partitioning the video into scenes comprises; extracting feature vectors for sampled frames in the video; detecting shot boundaries based on distances between feature vectors for successive sampled frames; producing a frame-similarity matrix, wherein each element in the frame-similarity matrix represents a distance between feature vectors for a corresponding pair of sampled frames; using the frame-similarity matrix and the detected shot boundaries to compute a shot-similarity matrix, wherein each element in the shot-similarity matrix represents a similarity between a corresponding pair of shots; and
;determining scene boundaries by selectively merging successive shots together based on the computed similarities between the successive shots; generating a scene-similarity matrix based on the frame-similarity matrix and the determined scene boundaries, each element of the scene-similarity matrix representing a measure of similarity between different scenes of the video, the scene-similarity matrix comprising a plurality of elements; determining an importance score for each scene based on the scene-similarity matrix and a distance from the scene to an average scene of the video, the importance score for a scene indicating a relative importance of the scene and wherein the importance score is increased responsive to the scene having a high similarity with other scenes in the video; selecting representative scenes from the video based on the determined importance scores; and combining selected scenes to produce the summary for the video; and a processor configured to execute the executable instructions.
-
Specification