MULTIMODAL AND REAL-TIME METHOD FOR FILTERING SENSITIVE MEDIA
First Claim
1. A multimodal and real-time method for filtering sensitive content, receiving as input a digital video stream, comprising:
- segmenting digital video into video fragments along a video timeline;
extracting features containing significant information from the digital video input on sensitive media;
reducing a semantic difference between each of low-level video features, and a high-level sensitive concept;
classifying the video fragments, and generating a high-level label (positive or negative), with a confidence score for each fragment representation;
performing high-level fusion to properly match the possible high-level labels and confidence scores for each fragment; and
predicting sensitive moments by combining labels of the fragments along the video timeline, indicating the moments when the content becomes sensitive.
1 Assignment
0 Petitions
Accused Products
Abstract
A multimodal and real-time method for filtering sensitive content, receiving as input a digital video stream, the method including segmenting digital video into video fragments along the video timeline; extracting features containing significant information from the digital video input on sensitive media; reducing the semantic difference between each of the low-level video features, and the high-level sensitive concept; classifying the video fragments, generating a high-level label (positive or negative), with a confidence score for each fragment representation; performing high-level fusion to properly match the possible high-level labels and confidence scores for each fragment; and predicting the sensitive time by combining the labels of the fragments along the video timeline, indicating the moments when the content becomes sensitive.
45 Citations
12 Claims
-
1. A multimodal and real-time method for filtering sensitive content, receiving as input a digital video stream, comprising:
-
segmenting digital video into video fragments along a video timeline; extracting features containing significant information from the digital video input on sensitive media; reducing a semantic difference between each of low-level video features, and a high-level sensitive concept; classifying the video fragments, and generating a high-level label (positive or negative), with a confidence score for each fragment representation; performing high-level fusion to properly match the possible high-level labels and confidence scores for each fragment; and predicting sensitive moments by combining labels of the fragments along the video timeline, indicating the moments when the content becomes sensitive. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
Specification