Video summarization using semantic information
First Claim
Patent Images
1. An apparatus, comprising:
- a processor to segment an incoming video stream into a plurality of activity segments, wherein each frame is associated with an activity;
the processor to execute a scoring mechanism to calculate a score for each frame of each activity, wherein the score is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity; and
the processor to execute a summarizer to summarize the activity segments based on the score for each frame and select a high score region of a frame of the summarized activity segment, wherein the high score region represents a most important and salient moment within the summarized activity segment.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus for video summarization using semantic information is described herein. The apparatus includes a controller, a scoring mechanism, and a summarizer. The controller is to segment an incoming video stream into a plurality of activity segments, wherein each frame is associated with an activity. The scoring mechanism is to calculate a score for each frame of each activity, wherein the score is based on a plurality of objects in each frame. The summarizer is to summarize the activity segments based on the score for each frame.
-
Citations
22 Claims
-
1. An apparatus, comprising:
-
a processor to segment an incoming video stream into a plurality of activity segments, wherein each frame is associated with an activity; the processor to execute a scoring mechanism to calculate a score for each frame of each activity, wherein the score is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity; and the processor to execute a summarizer to summarize the activity segments based on the score for each frame and select a high score region of a frame of the summarized activity segment, wherein the high score region represents a most important and salient moment within the summarized activity segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for video summarization, comprising:
-
labeling each frame of a plurality of frames according to an activity class; determining an object-to-activity correlation for an object within each frame, wherein the object-to-activity correlation is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity, wherein an object is extracted from a frame via frame analysis using a convolutional neural network with tiled neurons; and rendering a video summary that comprises the frames with object-to-activity correlations above a predetermined threshold for each frame in a shot boundary. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
a display; an image capture mechanism; a memory that is to store instructions and that is communicatively coupled to the image capture mechanism and the display; and a processor communicatively coupled to the image capture mechanism, the display, and the memory, wherein when the processor is to execute the instructions, the processor is to; label each frame of a plurality of frames according to an activity class; determine a score corresponding to each frame wherein the score is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity wherein an object is extracted from a frame via frame analysis using a convolutional neural network with tiled neurons; and render a video summary that comprises the frames with scores above a predetermined threshold for each frame in a shot boundary. - View Dependent Claims (17, 18, 19)
-
-
20. A tangible, non-transitory, computer-readable medium comprising instructions that, when executed by a processor, direct the processor to:
-
label each frame of a plurality of frames according to an activity class; determine an object-to-activity correlation for an object within each frame, wherein the object-to-activity correlation is based, at least partially, on a classification probability of each frame and a co-occurrence of a plurality of objects with each activity, wherein an object is extracted from a frame via frame analysis using a convolutional neural network with tiled neurons; and render a video summary that comprises the frames with object-to-activity correlations above a predetermined threshold for each frame in a shot boundary. - View Dependent Claims (21, 22)
-
Specification