MULTIMODAL AND REAL-TIME METHOD FOR FILTERING SENSITIVE MEDIA

US 20170289624A1
Filed: 06/30/2016
Published: 10/05/2017
Est. Priority Date: 04/01/2016
Status: Active Grant

First Claim

Patent Images

1. A multimodal and real-time method for filtering sensitive content, receiving as input a digital video stream, comprising:

segmenting digital video into video fragments along a video timeline;

extracting features containing significant information from the digital video input on sensitive media;

reducing a semantic difference between each of low-level video features, and a high-level sensitive concept;

classifying the video fragments, and generating a high-level label (positive or negative), with a confidence score for each fragment representation;

performing high-level fusion to properly match the possible high-level labels and confidence scores for each fragment; and

predicting sensitive moments by combining labels of the fragments along the video timeline, indicating the moments when the content becomes sensitive.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multimodal and real-time method for filtering sensitive content, receiving as input a digital video stream, the method including segmenting digital video into video fragments along the video timeline; extracting features containing significant information from the digital video input on sensitive media; reducing the semantic difference between each of the low-level video features, and the high-level sensitive concept; classifying the video fragments, generating a high-level label (positive or negative), with a confidence score for each fragment representation; performing high-level fusion to properly match the possible high-level labels and confidence scores for each fragment; and predicting the sensitive time by combining the labels of the fragments along the video timeline, indicating the moments when the content becomes sensitive.

45 Citations

View as Search Results

12 Claims

1. A multimodal and real-time method for filtering sensitive content, receiving as input a digital video stream, comprising:
- segmenting digital video into video fragments along a video timeline;
  
  extracting features containing significant information from the digital video input on sensitive media;
  
  reducing a semantic difference between each of low-level video features, and a high-level sensitive concept;
  
  classifying the video fragments, and generating a high-level label (positive or negative), with a confidence score for each fragment representation;
  
  performing high-level fusion to properly match the possible high-level labels and confidence scores for each fragment; and
  
  predicting sensitive moments by combining labels of the fragments along the video timeline, indicating the moments when the content becomes sensitive.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A multimodal and real-time method for filtering sensitive content according to claim 1, wherein the extracting the features comprises extracting visual, auditory or text features from a frame, audio, text extracted, respectively.
  - 3. A multimodal and real-time method for filtering sensitive content according to claim 1, wherein the reducing the semantic difference between each of the low-level video features, and the high-level sensitive concept, in an offline operation comprises:
    - analyzing dominant components by transforming the features into feature vectors;
      
      projecting the feature vector into another vector space after the transformations were learned in the analyzing;
      
      building one codebook for later reference by splitting a space of low-level descriptions in various regions where each region is associated with a visual visual/auditory/textual word, and storing these words in the codebook;
      
      mid-level coding to quantify each low-level feature vector extracted from the frames/audio/text with respect to its similarity to the words that compose the codebook; and
      
      grouping fragments by aggregating the quantization obtained from quantization of the encoding, and summarizing how the visual/auditory/textual words are being manifested.
  - 4. A multimodal and real-time method for filtering sensitive content according to claim 1, wherein the reducing the semantic difference between each of the low-level video features, and high-level sensitive concept, in an online operation comprises the data projection, mid-level coding and grouping of fragments in which projection transformation previously learned and a codebook are read in data projection and in mid-level coding, respectively.
  - 5. A multimodal and real-time method for filtering sensitive content according to claim 1, wherein classification of activity of video fragments, in an offline operation, comprises generating a prediction model which applies a supervised machine learning technique to deduce an ideal video fragments classification model, and this learned model is stored in a prediction model for later use in an online operation.
  - 6. A multimodal and real-time method for filtering sensitive content according to claim 1, wherein the classifying the video fragments, in an online operation, comprises predicting a video segment class, wherein labels for each unknown video segment are predicted with a confidence score based on a prediction model that was previously learned/estimated in an offline operation.
  - 7. A multimodal and real-time method for filtering sensitive content according to claim 1, wherein the performing the high-level fusion comprises:
    - temporally aligning N video segment classifiers along the video timeline;
      
      representing an N-dimensional vector, which builds an N-dimensional vector for each instant of interest of a target video, and within this vector, every i-th component (with i belonging to the natural interval [1 . . . N]) holds a classification confidence score of the i-th fragment classifier, in relation to a video segment which reference moment coincides with a reference instant of interest;
      
      in an offline operation, generating a late fusion model receives a training dataset groundtruth, and employs a supervised machine learning technique to generate a good late fusion model, and a learned late fusion model is stored for later use; and
      
      in an online operation, predicting of an N-dimensional vector class retrieves the late fusion model, and predicts the labels for each N-dimensional vector with a proper confidence score;
      
      a classification score noise suppressing uses any kind of denoising function even to flatten a classification score, along the video timeline, then a classification score fusing combines scores of adjacent video instants of interest that belong to a same sensitive class according to decision thresholds.
  - 8. A multimodal and real-time method for filtering sensitive content according to claim 1, which detects sensitive digital video content in real time in using low memory and computational footprint on low-powered devices comprising smartphones, tablets, smart glasses, virtual reality devices/displays, smart TVs and other video devices.
  - 9. A multimodal and real-time method for filtering sensitive content according to claim 1, wherein said video fragments have a fixed or varied temporal size, and may or may not have temporal overlap.
  - 10. A multimodal and real-time method for filtering sensitive content according to claim 3, wherein chosen parameters of algebraic processing in the analyzing the dominant components are learned/estimated from a learning/training dataset, and then stored in a projection transformation dataset for later use.
  - 11. A multimodal and real-time method for filtering sensitive content according to claim 1, wherein a machine learning method comprises one of:
    - support vector machine (SVM), Random Forests (Random Forests), and decision trees.
  - 12. A multimodal and real-time method for filtering sensitive content according to claim 1, wherein, in a case of missing fragments, the confidence score has a complete uncertainty value that may be interpolated.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Eletrônica da Amazônia Ltda (Samsung Electronics Co. Ltd.), Universidade Estadual De Campinas
Original Assignee
Samsung Electronics Co. Ltd., Universidade Estadual De Campinas
Inventors
AVILA, Sandra, MOREIRA, Daniel, PEREZ, Mauricio, MORAES, Daniel, TESTONI, Vanessa, GOLDENSTEIN, Siome, VALLE, Eduardo, ROCHA, Anderson

Granted Patent

US 10,194,203 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/78   Retrieval characterised by ...

G06F 18/213   Feature extraction, e.g. by...

G06F 18/2411   based on the proximity to a...

G06F 18/253   of extracted features

G06F 18/256   of results relating to diff...

G06V 10/464   using a plurality of salien...

G06V 10/764   using classification, e.g. ...

G06V 20/41   Higher-level, semantic clus...

G06V 20/49   Segmenting video sequences,...

G10L 25/57   for processing of video sig...

H04L 65/60   Network streaming of media ...

H04L 65/764   at the destination reforma...

H04N 21/4542   Blocking scenes or portions...

H04N 21/4545   Input to filtering algorith...

MULTIMODAL AND REAL-TIME METHOD FOR FILTERING SENSITIVE MEDIA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

45 Citations

12 Claims

Specification

Use Cases

Quick Links

Others

MULTIMODAL AND REAL-TIME METHOD FOR FILTERING SENSITIVE MEDIA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

12 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others