System and method for semantic video segmentation based on joint audiovisual and text analysis
First Claim
1. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a thematic topic, the method comprising:
- dividing a video into a plurality of homogeneous segments;
analyzing audio and visual content of the video;
extracting a plurality of keywords from speech content of each of the plurality of homogeneous segments of the video; and
detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with results of both the audio and visual analysis and the keyword extraction.
1 Assignment
0 Petitions
Accused Products
Abstract
System and method for partitioning a video into a series of semantic units where each semantic unit relates to a generally complete thematic topic. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a theme or a topic, comprises dividing a video into a plurality of homogeneous segments, analyzing audio and visual content of the video, extracting a plurality of keywords from the speech content of each of the plurality of homogeneous segments of the video, and detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with the results of both the audio and visual analysis and the keyword extraction. The present invention can be applied to generate important table-of-contents as well as index tables for videos to facilitate efficient video topic searching and browsing.
58 Citations
20 Claims
-
1. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a thematic topic, the method comprising:
-
dividing a video into a plurality of homogeneous segments;
analyzing audio and visual content of the video;
extracting a plurality of keywords from speech content of each of the plurality of homogeneous segments of the video; and
detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with results of both the audio and visual analysis and the keyword extraction. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for partitioning a video into a series of semantic units wherein each semantic unit relates to a thematic topic, the system comprising:
-
a video segmenting unit for dividing a video into a plurality of homogeneous segments;
an audio and visual analyzing unit for analyzing audio and visual content of the video;
a keyword extracting unit for extracting a plurality of keywords from speech content of each of the plurality of homogeneous segments of the video; and
a detecting and merging unit for detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with results of both the audio and visual analysis and the keyword extraction. - View Dependent Claims (15, 16, 17)
-
-
18. A computer program product for partitioning a video into a series of semantic units wherein each semantic unit relates to a thematic topic, the computer program product comprising:
-
a computer usable medium having computer usable program code embodied therein;
computer usable program code configured to divide a video into a plurality of homogeneous segments;
computer usable program code configured to analyze audio and visual content of the video;
computer usable program code configured to extract a plurality of keywords from speech content of each of the plurality of homogeneous segments of the video; and
computer usable program code configured to detect and merge a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with results of both the audio and visual analysis and the keyword extraction. - View Dependent Claims (19, 20)
-
Specification