Forming a representation of a video item and use thereof
First Claim
1. A method for forming a representation of a video item, comprising:
- receiving a video item;
dividing the video item into a plurality of segments;
extracting a key frame from each of the plurality of segments to form a plurality of key frames based on an user attention feature of each frame in the plurality of segments, each user attention feature is determined based on a quantity of face images present in a corresponding frame, each key frame serving as a representation of a video segment from the plurality of segments; and
organizing the plurality of segments into one or more groups corresponding to one or more respective scenes based at least on the plurality of key frames;
identifying a final key frame from the plurality of key frames for each of the one or more groups; and
correlating a video vignette with each of the identified final key frames.
2 Assignments
0 Petitions
Accused Products
Abstract
Functionality is described for forming a summary representation of a video item to help a user decide whether to obtain a full version of the video item. The functionality operates by: (a) receiving a video item; (b) dividing the video item into a plurality of segments; (c) extracting at least one key frame from each of the plurality of segments to form a plurality of key frames; and (d) organizing the video segments into one or more groups corresponding to one or more respective scenes based on the plurality of key frames, to thereby form the representation of the video item. The functionality can be used to communicate search results to a user, to provide a sample of the video item in a message, etc.
61 Citations
20 Claims
-
1. A method for forming a representation of a video item, comprising:
-
receiving a video item; dividing the video item into a plurality of segments; extracting a key frame from each of the plurality of segments to form a plurality of key frames based on an user attention feature of each frame in the plurality of segments, each user attention feature is determined based on a quantity of face images present in a corresponding frame, each key frame serving as a representation of a video segment from the plurality of segments; and organizing the plurality of segments into one or more groups corresponding to one or more respective scenes based at least on the plurality of key frames; identifying a final key frame from the plurality of key frames for each of the one or more groups; and correlating a video vignette with each of the identified final key frames. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
presenting a representation of a video item comprising one or more final key frames, wherein each or the one or more final key frames is identified by; receiving the video item; dividing the video item into a plurality of segments based on visual features of multiple frames in the video item and audio features associated with the multiple frames of the video item; extracting a key frame from each of the plurality of segments to form one or more key frames based on an user attention feature of each frame in the plurality of segments, each user attention feature is determined based on a quantity of face images present in a corresponding frame, a brightness of the corresponding frame, and an amount of motion in the corresponding frame, each key frame serving as a representation of a video segment from the plurality of segments; organizing the plurality of segments into one or more groups corresponding to one or more respective scenes based on the one or more key frames; identifying a final key frame from the one or more key frames for each of the one or more groups; and correlating a video vignette with each of the identified final key frames; receiving a selection of a final key frame by a user; and presenting the video vignette correlated with the selected final key frame in response to the selection of the final key frame by the user, wherein the video vignette is selected to give the user information to make a decision as to whether to receive all of the video item. - View Dependent Claims (17, 18, 19)
-
-
20. A system comprising:
-
an audio analysis module that classifies each frame of a video item, based at least on audio features of each frame of the video item, as corresponding to a vowel, a consonant, or a pause; a video segmentation module that divides the video item into a plurality of segments at a frame that corresponds to a pause; a key frame extraction module that extracts a key frame from each of the plurality of segments to form a plurality of key frames, the key frame serving as a representation of the video segment from which it is extracted; a grouping module that organizes the video segments into one or more groups corresponding to one or more respective scenes based on the plurality of key frames, to thereby form the representation of the video item; an output generating module that correlates a video vignette with each key frame; and a video presentation module that presents each video vignette to a user in response to receipt of a user selection of one of the key frames that is correlated with the video vignette.
-
Specification