VIDEO CONCEPT CLASSIFICATION USING AUDIO-VISUAL ATOMS
First Claim
1. A method for determining a classification for a video segment, comprising using a processor to perform the steps of:
- a) breaking the video segment into a plurality of short-term video slices, each including a plurality of video frames and an audio signal;
b) analyzing the video frames for each short-term video slice to form a plurality of region tracks, wherein the region tracks provide an indication of the position of identified image regions in a plurality of video frames;
c) analyzing each region track to form a corresponding visual feature vector providing an indication of visual features for the image region, and a motion feature vector providing an indication of inter-frame motion for the image region;
d) analyzing the audio signal for each short-term video slice to determine an audio feature vector providing a characterization of the audio signal;
e) forming a plurality of short-term audio-visual atoms for each short-term video slice by combining the visual feature vector and the motion feature vector for a particular region track with the corresponding audio feature vector; and
f) using a classifier to determine a classification for the video segment responsive to the short-term audio-visual atoms.
12 Assignments
0 Petitions
Accused Products
Abstract
A method for determining a classification for a video segment, comprising the steps of: breaking the video segment into a plurality of short-term video slices, each including a plurality of video frames and an audio signal; analyzing the video frames for each short-term video slice to form a plurality of region tracks; analyzing each region track to form a visual feature vector and a motion feature vector; analyzing the audio signal for each short-term video slice to determine an audio feature vector; forming a plurality of short-term audio-visual atoms for each short-term video slice by combining the visual feature vector and the motion feature vector for a particular region track with the corresponding audio feature vector; and using a classifier to determine a classification for the video segment responsive to the short-term audio-visual atoms.
59 Citations
12 Claims
-
1. A method for determining a classification for a video segment, comprising using a processor to perform the steps of:
-
a) breaking the video segment into a plurality of short-term video slices, each including a plurality of video frames and an audio signal; b) analyzing the video frames for each short-term video slice to form a plurality of region tracks, wherein the region tracks provide an indication of the position of identified image regions in a plurality of video frames; c) analyzing each region track to form a corresponding visual feature vector providing an indication of visual features for the image region, and a motion feature vector providing an indication of inter-frame motion for the image region; d) analyzing the audio signal for each short-term video slice to determine an audio feature vector providing a characterization of the audio signal; e) forming a plurality of short-term audio-visual atoms for each short-term video slice by combining the visual feature vector and the motion feature vector for a particular region track with the corresponding audio feature vector; and f) using a classifier to determine a classification for the video segment responsive to the short-term audio-visual atoms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for determining a representation of a video segment, comprising using a processor to perform the steps of:
-
a) breaking the video segment into a plurality of short-term video slices, each including a plurality of video frames and an audio signal; b) analyzing the video frames for each short-term video slice to form a plurality of region tracks, wherein the region tracks provide an indication of the inter-frame motion for image regions that occur in a plurality of video frames; c) analyzing each region track to form a corresponding visual feature vector and a motion feature vector; d) analyzing the audio signal for each short-term video slice to determine an audio feature vector; e) forming a plurality of short-term audio-visual atoms for each short-term video slice by combining the visual feature vector and the motion feature vector for a particular region track with the corresponding audio feature vector; and f) combining the short-term audio-visual atoms to form a representation of the video segment.
-
-
12. A system comprising:
-
a data processing system; and a memory system communicatively connected to the data processing system and storing instructions configured to cause the data processing system to implement a method for determining a classification for a video segment, wherein the instructions comprise; a) breaking the video segment into a plurality of short-term video slices, each including a plurality of video frames and an audio signal; b) analyzing the video frames for each short-term video slice to form a plurality of region tracks, wherein the region tracks provide an indication of the position of identified image regions in a plurality of video frames; c) analyzing each region track to form a corresponding visual feature vector providing an indication of visual features for the image region, and a motion feature vector providing an indication of inter-frame motion for the image region; d) analyzing the audio signal for each short-term video slice to determine an audio feature vector providing a characterization of the audio signal; e) forming a plurality of short-term audio-visual atoms for each short-term video slice by combining the visual feature vector and the motion feature vector for a particular region track with the corresponding audio feature vector; and f) using a classifier to determine a classification for the video segment responsive to the short-term audio-visual atoms.
-
Specification