Face detection and tracking in a video sequence
First Claim
1. A method of detecting and tracking human faces across a sequence of video frames, said method comprising the steps of:
- (a) forming a 3D pixel data block from said sequence of video frames;
(b) segmenting said 3D data block into a set of 3D segments using 3D spatiotemporal segmentation;
(c) forming 2D segments from an intersection of said 3D segments with a view plane, each 2D segment being associated with one 3D segment;
(d) in at least one of said 2D segments, extracting features and grouping said features into one or more groups of features;
(e) for each group of features, computing a probability that said group of features represents human facial features based on the similarity of the geometry of said group of features with the geometry of a human face model;
(f) matching at least one group of features with a group of features in a previous 2D segment and computing an accumulated probability that said group of features represents human facial features using probabilities of matched groups of features;
(g) classifying each 2D segment as a face segment or a non-face segment based on said accumulated probability of at least one group of features in each of said 2D segments; and
(h) tracking said human faces by finding an intersection of 3D segments associated with said face segments with at least subsequent view planes.
1 Assignment
0 Petitions
Accused Products
Abstract
A method (100) and apparatus (700) are disclosed for detecting and tracking human faces across a sequence of video frames. Spatiotemporal segmentation is used to segment (115) the sequence of video frames into 3D segments. 2D segments are then formed from the 3D segments, with each 2D segment being associated with one 3D segment. Features are extracted (140) from the 2D segments and grouped into groups of features. For each group of features, a probability that the group of features includes human facial features is calculated (145) based on the similarity of the geometry of the group of features with the geometry of a human face model. Each group of features is also matched with a group of features in a previous 2D segment and an accumulated probability that said group of features includes human facial features is calculated (150). Each 2D segment is classified (155) as a face segment or a non-face segment based on the accumulated probability. Human faces are then tracked by finding 2D segments in subsequent frames associated with 3D segments associated with face segments.
-
Citations
21 Claims
-
1. A method of detecting and tracking human faces across a sequence of video frames, said method comprising the steps of:
-
(a) forming a 3D pixel data block from said sequence of video frames; (b) segmenting said 3D data block into a set of 3D segments using 3D spatiotemporal segmentation; (c) forming 2D segments from an intersection of said 3D segments with a view plane, each 2D segment being associated with one 3D segment; (d) in at least one of said 2D segments, extracting features and grouping said features into one or more groups of features; (e) for each group of features, computing a probability that said group of features represents human facial features based on the similarity of the geometry of said group of features with the geometry of a human face model; (f) matching at least one group of features with a group of features in a previous 2D segment and computing an accumulated probability that said group of features represents human facial features using probabilities of matched groups of features; (g) classifying each 2D segment as a face segment or a non-face segment based on said accumulated probability of at least one group of features in each of said 2D segments; and (h) tracking said human faces by finding an intersection of 3D segments associated with said face segments with at least subsequent view planes. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for detecting and tracking human faces across a sequence of video frames, said apparatus comprising:
-
means for forming a 3D pixel data block from said sequence of video frames; means for segmenting said 3D data block into a set of 3D segments using 3D spatiotemporal segmentation; means for forming 2D segments from an intersection of said 3D segments with a view plane, each 2D segment being associated with one 3D segment; in at least one of said 2D segments, means for extracting features and grouping said feature'"'"'s into one or more groups of features; for each group of features, means for computing a probability that said group of features represents human facial features based on the similarity of the geometry of said group of features with the geometry of a human face model; means for matching at least one group of features with a group of features in a previous 2D segment and computing an accumulated probability that said group of features represents human facial features using probabilities of matched groups of features; means for classifying each 2D segment as a face segment or a non-face segment based on said accumulated probability of at least one group of features in each of said 2D segments; and means for tracking said human faces by finding an intersection of 3D segments associated with said face segments with at least subsequent view planes. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-executable program stored on a computer readable storage medium, the program for detecting and tracking human faces across a sequence of video frames, said program comprising:
-
code for forming a 3D pixel data block from said sequence of video frames; code for segmenting said 3D data block into a set of 3D segments using 3D spatiotemporal segmentation; code for forming 2D segments from an intersection of said 3D segments with a view plane, each 2D segment being associated with one 3D segment; in at least one of said 2D segments, code for extracting features and grouping said features into one or more groups of features; for each group of features, code for computing a probability that said group of features represents human facial features based on the similarity of the geometry of said group of features with the geometry of a human face model; code for matching at least one group of features with a group of features in a previous 2D segment and computing an accumulated probability that said group of features represents human facial features using probabilities of matched groups of features; code for classifying each 2D segment as a face segment or a non-face segment based on said accumulated probability of at least one group of features in each of said 2D segments; and code for tracking said human faces by finding an intersection of 3D segments associated with said face segments with at least subsequent view planes. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification