Keyframe selection to represent a video
First Claim
1. A method of extracting a single representative key frame from a sequence of frames, the sequence of frames including a plurality of shots, comprising the steps of:
- performing face detection in the sequence of frames comprising the steps of;
creating a set of images for each frame in the sequence of frames with each image in the set of images smaller than the previous image; and
searching for faces having at least a minimum size in a selected portion of the set of images;
detecting shot boundaries in the sequence of frames to identify shots within the detected shot boundaries;
selecting a most interesting shot from the identified shots based on a number of detected faces in the shot; and
selecting the single representative key frame representative of the sequence of frames from the selected shot based on a number of detected faces in the frame.
3 Assignments
0 Petitions
Accused Products
Abstract
A key frame representative of a sequence of frames in a video file is selected by applying face detection to a video to select a key frame which may include people and has particular application to indexing video files located by a search engine web crawler. A key frame, one frame representative of a video file, is extracted from the sequence of frames. The sequence of frames may include multiple scenes or shots, for example, continuous motions relative to a camera separated by transitions, cuts, fades and dissolves. To extract a key frame face detection is performed in each frame and a key frame is selected from the sequence of frames based on a sum of detected faces in the frame.
-
Citations
29 Claims
-
1. A method of extracting a single representative key frame from a sequence of frames, the sequence of frames including a plurality of shots, comprising the steps of:
-
performing face detection in the sequence of frames comprising the steps of;
creating a set of images for each frame in the sequence of frames with each image in the set of images smaller than the previous image; and
searching for faces having at least a minimum size in a selected portion of the set of images;
detecting shot boundaries in the sequence of frames to identify shots within the detected shot boundaries;
selecting a most interesting shot from the identified shots based on a number of detected faces in the shot; and
selecting the single representative key frame representative of the sequence of frames from the selected shot based on a number of detected faces in the frame. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
selecting the scale factor dependent on the size of the frame.
-
-
5. The method as claimed in claim 1 further comprising the step of:
tracking overlap of a detected face in consecutive frames in order to filter detected faces which are not likely to be valid.
-
6. The method as claimed in claim 1 wherein the step of selecting a most interesting shot includes providing a shot score based on a set of measures selected from the group consisting of motion between frames, amount of skin color pixels, shot length and detected faces.
-
7. The method as claimed in claim 6 wherein each measure includes a respective weighting factor.
-
8. The method as claimed in claim 7 wherein the weighting factor is dependent on the level of confidence of the measure.
-
9. The method as claimed in claim 1 wherein the step of performing face detection uses a neural network-based algorithm.
-
10. An apparatus for extracting a single representative key frame from a sequence of frames comprising:
-
means for performing face detection in the sequence of frames, the means for performing comprising;
means for creating a set of images for the frame with each image in the set of images smaller than the previous image; and
means for searching for faces having at least a minimum size in a selected portion of the set of images;
means for detecting shot boundaries in the sequence of frames to identify shots within shot boundaries;
means for selecting a most interesting shot from the identified shots based on a number of detected faces in the shot; and
means for selecting the single representative key frame representative of the sequence of frames from the selected shot based on a number of detected faces in the frame.- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
means for selecting the scale factor dependent on the size of the frame.
-
-
14. The apparatus as claimed in claim 10 further comprising:
means for tracking overlap of a detected face in consecutive frames to filter detected faces which are not likely to be valid.
-
15. The apparatus as claimed in claim 10 wherein the means for selecting a most interesting shot comprises:
means for providing a shot score based on a set of measures selected from the group consisting of motion between frames, amount of skin color pixels, shot length and detected faces.
-
16. The apparatus as claimed in claim 15 wherein each measure includes a respective weighting factor.
-
17. The apparatus as claimed in claim 16 wherein the weighting factor is dependent on the level of confidence of the measure.
-
18. The apparatus as claimed in claim 10 wherein the means for performing face detection uses a neural network-based algorithm.
-
19. An apparatus for extracting a single representative key frame from a sequence of frames comprising:
-
a face detector which performs face detection in the sequence of frames the face detector including;
an image creator which creates a set of images for the frame with each image in the set of images smaller than the previous image; and
a face searcher which searches for faces having at least a minimum size in a selected portion of the set of images; and
a key frame selector which selects a key frame representative of the sequence of frames from the sequence of frames based on a number of detected faces in the frame. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
a frame sampler which selects the scale factor dependent on the size of the frame.
-
-
23. The apparatus as claimed in claim 19 further comprising:
a face tracker which tracks a detected face through consecutive frames to filter detected faces which are not likely to be valid.
-
24. The apparatus as claimed in claim 19 wherein the key shot detector comprises:
a shot score generator which generates a shot score for based on a set of measures selected from the group consisting of motion between frames, amount of skin color pixels, shot length and detected faces.
-
25. The apparatus as claimed in claim 24 wherein each measure includes a respective weighting factor.
-
26. The apparatus as claimed in claim 25 wherein the weighting factor is dependent on the level of confidence of the measure.
-
27. The apparatus as claimed in claim 19 wherein the face detector uses a neural network-based algorithm.
-
28. A computer system comprising:
-
a memory system storing a sequence of frames; and
a face detector which performs face detection in the sequence of frames, the face detector comprising;
an image creator which creates a set of images for the frame with each image in the set of images smaller than the previous image; and
a face searcher which searches for faces having at least a minimum size in a selected portion of the set of images;
a shot boundary detector which detects shot boundaries to identify shots within the detected shot boundaries; and
a key shot selector which selects a most interesting shot from the identified shots based on a number of detected faces in the shot; and
a key frame selector which selects the single representative key frame representative of the sequence of frames from the selected shot based on a number of detected faces in the frame.
-
-
29. An article of manufacture comprising:
-
a computer-readable medium for use in a computer having a memory;
a computer-implementable software program recorded on the medium for extracting a single representative key frame from a sequence of frames, the sequence of frames including a plurality of shots, the computer implemented software program comprising instructions for;
performing face detection in the sequence of frames comprising the steps of;
creating a set of images for each frame in the sequence of frames with each image in the set of images smaller than the previous image; and
searching for faces having at least a minimum size in a selected portion of the set of images;
detecting shot boundaries in the sequence of frames to identify shots within the detected shot boundaries;
selecting a most interesting shot from the identified shots based on a number of detected faces in the shot; and
selecting the single representative key frame representative of the sequence of frames from the selected shot based on a number of detected faces in the frame.
-
Specification