Systems and methods for semantically classifying and normalizing shots in video
First Claim
1. A method comprising:
- within each frame of a sequence of video frames, for each spatial segment of a plurality of spatial segments within the frame, determining likelihoods of the spatial segment corresponding to specific types of contents;
based on the likelihoods, generating arrangement data for each frame in the sequence, the arrangement data representing a spatial arrangement of the specific types of contents within the frame;
identifying groups of consecutive video frames, within the sequence, that have similar arrangement data;
based on the identified groups of consecutive video frames, identifying start times and end times for scenes within a video, the video comprising the video frames.
10 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure relates to systems and methods for classifying videos based on video content. For a given video file including a plurality of frames, a subset of frames is extracted for processing. Frames that are too dark, blurry, or otherwise poor classification candidates are discarded from the subset. Generally, material classification scores that describe type of material content likely included in each frame are calculated for the remaining frames in the subset. The material classification scores are used to generate material arrangement vectors that represent the spatial arrangement of material content in each frame. The material arrangement vectors are subsequently classified to generate a scene classification score vector for each frame. The scene classification results are averaged (or otherwise processed) across all frames in the subset to associate the video file with one or more predefined scene categories related to overall types of scene content of the video file.
34 Citations
21 Claims
-
1. A method comprising:
-
within each frame of a sequence of video frames, for each spatial segment of a plurality of spatial segments within the frame, determining likelihoods of the spatial segment corresponding to specific types of contents; based on the likelihoods, generating arrangement data for each frame in the sequence, the arrangement data representing a spatial arrangement of the specific types of contents within the frame; identifying groups of consecutive video frames, within the sequence, that have similar arrangement data; based on the identified groups of consecutive video frames, identifying start times and end times for scenes within a video, the video comprising the video frames. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more non-transitory media storing instructions that, when executed by one or more computing devices, cause performance of:
-
within each frame of a sequence of video frames, for each spatial segment of a plurality of spatial segments within the frame, determining likelihoods of the spatial segment corresponding to specific types of contents; based on the likelihoods, generating arrangement data for each frame in the sequence, the arrangement data representing a spatial arrangement of the specific types of contents within the frame; identifying groups of consecutive video frames, within the sequence, that have similar arrangement data; based on the identified groups of consecutive video frames, identifying start times and end times for scenes within a video, the video comprising the video frames. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a module, implemented at least partially by computing hardware, configured to, within each frame of a sequence of video frames, for each spatial segment of a plurality of spatial segments within the frame, determining likelihoods of the spatial segment corresponding to specific types of contents; a module, implemented at least partially by computing hardware, configured to, based on the likelihoods, generating arrangement data for each frame in the sequence, the arrangement data representing a spatial arrangement of the specific types of contents within the frame; a module, implemented at least partially by computing hardware, configured to identify groups of consecutive video frames, within the sequence, that have similar arrangement data; a module, implemented at least partially by computing hardware, configured to, based on the identified groups of consecutive video frames, identify start times and end times for scenes within a video, the video comprising the video frames. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification