Method and apparatus for generating a condensed version of a video sequence including desired affordances
First Claim
1. A method for generating a condensed version of a video sequence suitable for publication as an annotated video comprising steps of:
- storing the video sequence as a set of image frames;
stabilizing the image frames into a warped sequence of distinct and stationary scene changes wherein each scene change is comprised of an associated subset of the image frames;
generating a key frame for each scene change representative of the associated subset including generating a template image frame from the associated subset by median filtering of the associated subset and matching the template image to a closest one of the associated subset, wherein the closest one comprises the key frame;
comparing the key frame with the associated subset for identifying image frames including desired affordances; and
compiling the condensed version to comprise a set of key frames and desired affordance images.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus analyzes and annotates a technical talk typically illustrated with overhead slides, wherein the slides are recorded in a video sequence. The video sequence is condensed and digested into key video frames adaptable for annotation to time and audio sequence. The system comprises a recorder for recording a technical talk as a sequential set of video image frames. A stabilizing processor segregates the video image frames into a plurality of associated subsets each corresponding to a distinct slide displayed at the talk and for median filtering of the subsets for generating a key frame representative of each of the subsets. A comparator compares the key frame with the associated subsets to identify differences between the key frame and the associates subset which comprise nuisances and affordances. A gesture recognizer locates, tracks and recognizes gestures occurring in the subset as gesture affordances and identifies a gesture video frame representative of the gesture affordance. An integrator compiles the key frames and gesture video frames as a digest of the video image frames which can also be annotated with the time and audio sequence.
-
Citations
17 Claims
-
1. A method for generating a condensed version of a video sequence suitable for publication as an annotated video comprising steps of:
-
storing the video sequence as a set of image frames;
stabilizing the image frames into a warped sequence of distinct and stationary scene changes wherein each scene change is comprised of an associated subset of the image frames;
generating a key frame for each scene change representative of the associated subset including generating a template image frame from the associated subset by median filtering of the associated subset and matching the template image to a closest one of the associated subset, wherein the closest one comprises the key frame;
comparing the key frame with the associated subset for identifying image frames including desired affordances; and
compiling the condensed version to comprise a set of key frames and desired affordance images. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for forming a digested compilation of a sequential set of data frames including deleting redundant adjacent frames and nuisance variations therein, comprising:
-
generating a warped sequence of the sequential set wherein significant changes in the sequential set are detected for segregating the sequential set into associated subsets;
detecting a key frame representative of each the associated subsets;
comparing each key frame with each associated subset for detecting the nuisance variations and data frames having desired affordances, the detecting the nuisance variations including a word-wise comparison between the key frame and the associated subset; and
,integrating the key frames with the data frames having the desired affordances and thereby deleting the redundant data frames and the data frames having the nuisance variations, to form the digested compilation. - View Dependent Claims (12)
-
-
13. A system for analyzing and annotating a technical talk recorded as a video and audio sequence for condensing the video sequence into a digest of key video frames annotated to time and the audio sequence, comprising:
-
a recorder for recording the technical talk as a sequential set of video image frames;
a stabilizing processor for segregating the video image frames into a plurality of associated subsets each corresponding to a distinct slide displayed at the talk and for median filtering of the subsets for generating a key frame representative of the subsets;
a comparator for comparing the key frame with the associated subset to identify differences between the key frame and associated subset comprising nuisances and affordances;
a gesture recognizer for locating, tracking and recognizing gestures occurring in the subset as a gesture affordance an for identifying a gesture video frame representative of the gesture affordance, wherein the gesture recognizer includes a vocabulary of desired gesture affordances and a comparator for comparing the subset with the key frame and matching the affordances located thereby with the vocabulary; and
,an integrator for compiling the key frames and the gesture video frames as a digest of the video image frames and for annotating the digest with the time and the audio sequence. - View Dependent Claims (14, 15, 16, 17)
-
Specification