Exploiting multi-modal affect and semantics to assess the persuasiveness of a video

US 10,303,768 B2
Filed: 10/02/2015
Issued: 05/28/2019
Est. Priority Date: 05/04/2015
Status: Active Grant

First Claim

Patent Images

1. A method for determining the persuasiveness of a multimedia item, the method comprising, with a computing system comprising one or more computing devices:

using a visual feature extraction module, extracting a plurality of visual concept features from at least a portion of the multimedia item using automated machine learning techniques;

using an affective feature extraction module of the visual feature extraction module, automatically analyzing the extracted visual concept features to identify sentiments of the extracted visual concept features using a first, trained neural network; and

using a semantic feature extraction module of the visual feature extraction module, automatically analyzing the extracted visual concept features to identify semantic concepts of the extracted visual concept features using a second, different, trained neural network;

using an audio feature extraction module, extracting an audio concept feature from at least a portion of the multimedia item using automated machine learning techniques;

identifying a text item associated with the multimedia item and extracting text from at least a portion of the text item using a comment extraction module; and

using a video persuasiveness prediction module,receiving the semantic concepts, sentiments and visual concept features from the visual feature extraction module, the audio concept feature from the audio feature extraction module and the text from the comment extraction module andcomparing the semantic concepts, sentiments and visual concept features, the audio concept feature and the extracted text to semantic concepts, sentiments and visual concept features, audio concept features and text in a persuasiveness model having respective measures of audience impact, andgenerating a measure of audience impact for the multimedia item based on the comparison, wherein the measure of audience impact is used to determine the persuasiveness of the multimedia item.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technologies to detect persuasive multimedia content by using affective and semantic concepts extracted from the audio-visual content as well as the sentiment of associated comments are disclosed. The multimedia content is analyzed and compared with a persuasiveness model.

Citations

20 Claims

1. A method for determining the persuasiveness of a multimedia item, the method comprising, with a computing system comprising one or more computing devices:
- using a visual feature extraction module, extracting a plurality of visual concept features from at least a portion of the multimedia item using automated machine learning techniques;
  
  using an affective feature extraction module of the visual feature extraction module, automatically analyzing the extracted visual concept features to identify sentiments of the extracted visual concept features using a first, trained neural network; and
  
  using a semantic feature extraction module of the visual feature extraction module, automatically analyzing the extracted visual concept features to identify semantic concepts of the extracted visual concept features using a second, different, trained neural network;
  
  using an audio feature extraction module, extracting an audio concept feature from at least a portion of the multimedia item using automated machine learning techniques;
  
  identifying a text item associated with the multimedia item and extracting text from at least a portion of the text item using a comment extraction module; and
  
  using a video persuasiveness prediction module,receiving the semantic concepts, sentiments and visual concept features from the visual feature extraction module, the audio concept feature from the audio feature extraction module and the text from the comment extraction module andcomparing the semantic concepts, sentiments and visual concept features, the audio concept feature and the extracted text to semantic concepts, sentiments and visual concept features, audio concept features and text in a persuasiveness model having respective measures of audience impact, andgenerating a measure of audience impact for the multimedia item based on the comparison, wherein the measure of audience impact is used to determine the persuasiveness of the multimedia item.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the multimedia item comprises a video and the text item comprises one or more comments associated with the video, the extracted features comprise a combination of audio concept features and visual concept features extracted from the video, and the measure of audience impact is generated based on an analysis of the combination of audio concept features and visual concept features and extracted text.
  - 3. The method of claim 2, wherein the generating of the measure of audience impact further comprises calculating a score based on an individual analysis of each of the extracted features and extracted text.
  - 4. The method of claim 3, wherein the score is calculated by fusing individual scores calculated with respect to the individual extracted features and extracted text.
  - 5. The method of claim 4, wherein the score fusion is performed using:
    - an early fusion technique, a simple late fusion technique, or a learning based late fusion technique.
  - 6. The method of claim 1, further comprising:
    - comparing the measure of audience impact of the multimedia item with a second measure of audience impact associated with a second multimedia item; and
      
      outputting, in response to the comparing, an output which indicates the more persuasive multimedia item or the less persuasive multimedia item.

7. A multimodal data analyzer to determine the persuasiveness of a multimedia item, comprising:
- a processor to execute program instructions; and
  
  a memory in communication with the processor, the memory having stored therein at least one of programs and instructions executable by the processor to configure the multimodal data analyzer to implement;
  
  a visual feature extraction module to extract a plurality of visual concept features from at least a portion of the multimedia item using automated machine learning techniques;
  
  wherein the visual feature extraction module includes;
  
  an affective feature extraction module to receive the multimedia item from an input and to automatically analyze the extracted visual concept features to identify sentiments of the extracted visual concept features using a first, trained neural network; and
  
  a semantic feature extraction module to receive the multimedia item from an input and to automatically analyze the extracted visual concept features to identify semantic concepts of the extracted visual concept features using a second, different, trained neural network;
  
  an audio feature extraction module to receive the multimedia item from an input and to extract, an audio concept feature from at least a portion of the multimedia item using automated machine learning techniques;
  
  a comment extraction module to identify a text item associated with the multimedia item and extract text from at least a portion of the text item; and
  
  a video persuasiveness prediction module, toreceive the semantic concepts, sentiments and visual concept features from the visual feature extraction module, the audio concept feature from the audio feature extraction module and the text from the comment extraction module and tocompare the semantic concepts, sentiments and visual concept features, the audio concept feature and the extracted text to semantic concepts, sentiments and visual concept features, audio concept features and text in a persuasiveness model having respective measures of audience impact and togenerate a measure of audience impact for the multimedia item based on the comparison, wherein the measure of audience impact is used to determine the persuasiveness of the multimedia item.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The multimedia analyzer of claim 7, wherein the multimedia item comprises a video and the text item comprises one or more comments associated with the video, the extracted features comprise a combination of audio features and visual features extracted from the video, and the measure of audience impact is generated based on an analysis of the combination of audio features and visual features and extracted text.
  - 9. The multimedia analyzer of claim 8, wherein the generating of the measure of audience impact further comprises calculating a score based on an individual analysis of each of the extracted features and extracted text.
  - 10. The multimedia analyzer of claim 9, wherein the score is calculated by fusing individual scores calculated with respect to the individual extracted features and extracted text.
  - 11. The multimedia analyzer of claim 10, wherein the score fusion is performed using:
    - an early fusion technique, a simple late fusion technique, or a learning based late fusion technique.
  - 12. The multimedia analyzer of claim 7, further configured to:
    - compare the measure of audience impact of the multimedia item with a second measure of audience impact associated with a second multimedia item; and
      
      output, in response to the comparison, an output which indicates the more persuasive multimedia item or the less persuasive multimedia item.

13. A method for building a model of audience impact of a video, with a computing system comprising one or more computing devices, the method comprising:
- accessing a plurality of multimedia items and text items associated with the multimedia items;
  
  using a visual feature extraction module, extracting visual features from the multimedia items using automated machine learning techniques;
  
  using an affective feature extraction module of the visual feature extraction module, automatically analyzing the extracted visual features to identify sentiments of the extracted visual features using a first, trained neural network; and
  
  using a semantic feature extraction module of the visual feature extraction module, automatically analyzing the extracted visual features to identify semantic concepts of the extracted visual features using a second, different, trained neural network;
  
  using an audio feature extraction module, extracting an audio concept feature from at least a portion of the multimedia items using automated machine learning techniques;
  
  extracting text from the text items using a comment extraction module;
  
  using a video persuasiveness prediction module, annotating the identified semantic concepts and sentiments, the extracted audio features, visual features, and text items with a measure of audience impact based on the semantic analysis or the affective analysis of the visual features, an affective analysis of the audio features, and a sentiment analysis of the extracted text;
  
  classifying each of the multimedia items based on a combination of the annotations; and
  
  storing the classifications in the audience impact model.
- View Dependent Claims (14, 15, 16)
- - 14. The method of claim 13, comprising:
    - determining, based on the affective analysis of the extracted audio features, an indication of the emotional content of the audio, and generating the measure of audience impact based at least partly on the indication of emotional content of the audio.
  - 15. The method of claim 13, comprising:
    - generating the measure of audience impact based at least partly on the sentiment analysis performed on the extracted visual features.
  - 16. The method of claim 13, comprising:
    - performing a sentiment analysis on the extracted text, and generating the measure of audience impact based at least partly on the sentiment analysis performed on the extracted text.

17. A video classifier device for building a model of audience impact of a video, comprising:
- a visual feature extraction module to extract visual features from a plurality of accessed multimedia items using automated machine learning techniques;
  
  wherein the visual feature extraction module includes;
  
  an affective feature extraction module to receive the multimedia items from an input and to automatically analyze the extracted visual concept features to identify sentiments of the extracted visual concept features using a first, trained neural network; and
  
  a semantic feature extraction module to receive the multimedia items from an input and to automatically analyze the extracted visual concept features to identify semantic concepts of the extracted visual concept features using a second, different, trained neural network;
  
  an audio feature extraction module to receive the multimedia items from an input and to extract, an audio concept feature from at least a portion of the multimedia items using automated machine learning techniques;
  
  a comment extraction module to extract text from accessed text items;
  
  a video persuasiveness prediction module to annotate the identified semantic concepts and sentiments, the extracted audio features, visual features, and text items with a measure of audience impact based on the semantic analysis or the affective analysis of the visual features, an affective analysis of the audio features, and a sentiment analysis of the extracted text;
  
  classify each of the multimedia items based on a combination of the annotations; and
  
  store the classifications in an audience impact model.
- View Dependent Claims (18, 19, 20)
- - 18. The video classifier device of claim 17, configured to:
    - determine, based on the affective analysis of the extracted audio features, an indication of the emotional content of the audio, and generate the measure of audience impact based at least partly on the indication of emotional content of the audio.
  - 19. The video classifier device of claim 17, configured to:
    - generate the measure of audience impact based at least partly on the sentiment analysis performed on the extracted visual features.
  - 20. The video classifier device of claim 17, configured to:
    - perform a sentiment analysis on the extracted text, and generate the measure of audience impact based at least partly on the sentiment analysis performed on the extracted text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Divakaran, Ajay, Siddiquie, Behjat, Chisholm, David, Shriberg, Elizabeth
Primary Examiner(s)
Baker, Matthew H

Application Number

US14/874,348
Publication Number

US 20160328384A1
Time in Patent Office

1,334 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/9535   Search customisation based ...

G06F 40/169   Annotation, e.g. comment da...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/279   Recognition of textual enti...

G06F 40/30   Semantic analysis

G06V 20/41   Higher-level, semantic clus...

G06V 20/46   Extracting features or char...

G06V 20/635   Overlay text, e.g. embedded...

H04N 21/4532   involving end-user characte...

H04N 21/4668   for recommending content, e...

Exploiting multi-modal affect and semantics to assess the persuasiveness of a video

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Exploiting multi-modal affect and semantics to assess the persuasiveness of a video

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links