Method and system for video segmentation
First Claim
1. A computer implemented method for segmenting a video, in which the video includes video content and audio content, and the video content and the audio content are synchronized, comprising the steps of:
- classifying each frame of audio content of a video with a label to generate a sequence of consecutive labels;
assigning a dominant label to each successive time interval of consecutive labels, in which a length of the time interval is substantially longer than a length of the frame;
constructing a semantic description for sliding time windows of the successive time intervals, in which the sliding time windows overlap in time and a length of each time window is substantially longer then the length of the time interval, and the semantic description for each time window is a transition matrix determined from transitions between the successive dominant labels of the time intervals;
determining a marker from the transition matrices, in which a frequency of occurrence of the marker is between a low frequency threshold and a high frequency threshold; and
segmenting the video at the locations of the markers in the audio content.
1 Assignment
0 Petitions
Accused Products
Abstract
A method segments a video. Audio frames of the video are classified with labels. Dominant labels are assigned to successive time intervals of consecutive labels. A semantic description is constructed for sliding time windows of the successive time intervals, in which the sliding time windows overlap in time, and the semantic description for each time window is a transition matrix determined from the dominant labels of the time intervals. A marker is determined from the transition matrices, in which a frequency of occurrence of the marker is between a low frequency threshold and a high frequency threshold. Then, the video is segmented at the locations of the markers.
15 Citations
13 Claims
-
1. A computer implemented method for segmenting a video, in which the video includes video content and audio content, and the video content and the audio content are synchronized, comprising the steps of:
-
classifying each frame of audio content of a video with a label to generate a sequence of consecutive labels; assigning a dominant label to each successive time interval of consecutive labels, in which a length of the time interval is substantially longer than a length of the frame; constructing a semantic description for sliding time windows of the successive time intervals, in which the sliding time windows overlap in time and a length of each time window is substantially longer then the length of the time interval, and the semantic description for each time window is a transition matrix determined from transitions between the successive dominant labels of the time intervals; determining a marker from the transition matrices, in which a frequency of occurrence of the marker is between a low frequency threshold and a high frequency threshold; and segmenting the video at the locations of the markers in the audio content. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13)
-
-
11. The method of 1, in which the low frequency threshold is about one in a three, and the high frequency threshold is about one in a hundred.
Specification