Summarizing video content based on memorability of the video content
First Claim
Patent Images
1. A method for summarizing video content based on memorability of the video content, the method performed by one or more processing devices and comprising:
- accessing segments of an input video;
computing memorability scores for the segments, respectively, wherein computing a memorability score for a segment comprises;
generating (i) a semantic feature computed from an auto-captioning operation applied to the segment and (ii) a visual feature computed from one or more of a saliency analysis operation applied to the segment, a color analysis operation applied to the segment, and a spatio-temporal analysis operation applied to the segment,computing a first component score by applying a first predictor to the semantic feature, where the first predictor is trained to determine first component memorability scores by comparing user-generated memorability values with training semantic features generated by the auto-captioning operation,computing a second component score by applying a second predictor to the semantic feature, where the second predictor is trained to determine second component memorability scores by comparing the user-generated memorability values with training visual features generated by the one or more of the saliency analysis operation, the color analysis operation, and the spatio-temporal analysis operation, andcomputing the memorability score from an averaging operation applied to the first component score and the second component score;
selecting a subset of segments from the segments based on each computed memorability score in the subset having a threshold memorability score; and
generating visual summary content from the subset of the segments.
2 Assignments
0 Petitions
Accused Products
Abstract
Certain embodiments involve generating summarized versions of video content based on memorability of the video content. For example, a video summarization system accesses segments of an input video. The video summarization system identifies memorability scores for the respective segments. The video summarization system selects a subset of segments from the segments based on each computed memorability score in the subset having a threshold memorability score. The video summarization system generates visual summary content from the subset of the segments.
-
Citations
17 Claims
-
1. A method for summarizing video content based on memorability of the video content, the method performed by one or more processing devices and comprising:
-
accessing segments of an input video; computing memorability scores for the segments, respectively, wherein computing a memorability score for a segment comprises; generating (i) a semantic feature computed from an auto-captioning operation applied to the segment and (ii) a visual feature computed from one or more of a saliency analysis operation applied to the segment, a color analysis operation applied to the segment, and a spatio-temporal analysis operation applied to the segment, computing a first component score by applying a first predictor to the semantic feature, where the first predictor is trained to determine first component memorability scores by comparing user-generated memorability values with training semantic features generated by the auto-captioning operation, computing a second component score by applying a second predictor to the semantic feature, where the second predictor is trained to determine second component memorability scores by comparing the user-generated memorability values with training visual features generated by the one or more of the saliency analysis operation, the color analysis operation, and the spatio-temporal analysis operation, and computing the memorability score from an averaging operation applied to the first component score and the second component score; selecting a subset of segments from the segments based on each computed memorability score in the subset having a threshold memorability score; and generating visual summary content from the subset of the segments. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processing device; and a non-transitory computer-readable medium communicatively coupled to the processing device, wherein the processing device is configured to execute program stored in the non-transitory computer-readable medium and thereby perform operations comprising; identifying a summary length for a visual summary content to be generated using input video segments; determining that a summary subset of the input video segments (i) has a combined length that is less than or equal to the summary length and (ii) maximizes a sum of criteria scores for respective segments in the summary subset, wherein at least one criteria score comprises a memorability score for an input video segment weighted by a memorability weight and an additional video metric weighted by an additional video metric weight, wherein the additional video metric comprises one or more of video uniformity and video representativeness, selecting the summary-a subset of the input video segments based on determining that the summary subset maximizes the sum of criteria scores and is less than or equal to the summary length, and generating the visual summary content from the summary subset of the input video segments. - View Dependent Claims (9, 10, 11)
-
-
12. A non-transitory computer-readable medium having program code that is stored thereon, the program code executable by one or more processing devices for performing operations comprising:
-
accessing segments of an input video; computing memorability scores for the segments, respectively, wherein computing a memorability score for a segment comprises; generating (i) a semantic feature computed from an auto-captioning operation applied to the segment and (ii) a visual feature computed from one or more of a saliency analysis operation applied to the segment, a color analysis operation applied to the segment, and a spatio-temporal analysis operation applied to the segment, computing a first component score by applying a first predictor to the semantic feature, where the first predictor is trained to determine first component memorability scores by comparing user-generated memorability values with training semantic features generated by the auto-captioning operation, computing a second component score by applying a second predictor to the semantic feature, where the second predictor is trained to determine second component memorability scores by comparing the user-generated memorability values with training visual features generated by the one or more of the saliency analysis operation, the color analysis operation, and the spatio-temporal analysis operation, and computing the memorability score from an averaging operation applied to the first component score and the second component score; a step for selecting a subset of segments from the segments based on each computed memorability score in the subset having a threshold memorability score; and generating visual summary content from the subset of the segments. - View Dependent Claims (13, 14, 15, 16, 17)
-
Specification