Automatic detection and tracking of multiple individuals using multiple cues
First Claim
Patent Images
1. A method comprising:
- receiving a frame of content;
automatically detecting a candidate area for a new face region in the frame, wherein detecting the candidate area comprises;
determining whether there is motion at a plurality of pixels on a plurality of lines across the frame;
generating a sum of frame differences for each possible segment of each of the plurality of lines;
selecting, for each of the plurality of lines, the segment having the largest sum;
identifying a smoothest region of the selected segments;
checking whether the smoothest region resembles a human upper body; and
extracting, as the candidate area, a portion of the smoothest region that resembles a human head;
using one or more hierarchical verification levels to verify whether a human face is in the candidate area;
indicating that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area; and
using a plurality of cues to track each verified face in the content from frame to frame.
3 Assignments
0 Petitions
Accused Products
Abstract
Automatic detection and tracking of multiple individuals includes receiving a frame of video and/or audio content and identifying a candidate area for a new face region in the frame. One or more hierarchical verification levels are used to verify whether a human face is in the candidate area, and an indication made that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area. A plurality of audio and/or video cues are used to track each verified face in the video content from frame to frame.
98 Citations
19 Claims
-
1. A method comprising:
-
receiving a frame of content; automatically detecting a candidate area for a new face region in the frame, wherein detecting the candidate area comprises; determining whether there is motion at a plurality of pixels on a plurality of lines across the frame; generating a sum of frame differences for each possible segment of each of the plurality of lines; selecting, for each of the plurality of lines, the segment having the largest sum; identifying a smoothest region of the selected segments; checking whether the smoothest region resembles a human upper body; and extracting, as the candidate area, a portion of the smoothest region that resembles a human head; using one or more hierarchical verification levels to verify whether a human face is in the candidate area; indicating that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area; and using a plurality of cues to track each verified face in the content from frame to frame. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-readable storage medium comprising computer-program instructions that when executed by a processor perform acts of:
-
receiving a frame of content; automatically detecting a candidate area for a new face region in the frame, wherein detecting the candidate area comprises; determining whether there is motion at a plurality of pixels on a plurality of lines across the frame; generating a sum of frame differences for each possible segment of each of the plurality of lines; selecting, for each of the plurality of lines, the segment having the largest sum; identifying a smoothest region of the selected segments; checking whether the smoothest region resembles a human upper body; and extracting, as the candidate area, a portion of the smoothest region that resembles a human head; using one or more hierarchical verification levels to verify whether a human face is in the candidate area; indicating that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area; and using a plurality of cues to track each verified face in the content from frame to frame.
-
-
19. A computing device comprising:
-
a processor; and a memory coupled to the processor, the memory comprising computer-program instructions that when executed by the processor perform acts of; receiving a frame of content; automatically detecting a candidate area for a new face region in the frame, wherein detecting the candidate area comprises; determining whether there is motion at a plurality of pixels on a plurality of lines across the frame; generating a sum of frame differences for each possible segment of each of the plurality of lines; selecting, for each of the plurality of lines, the segment having the largest sum; identifying a smoothest region of the selected segments; checking whether the smoothest region resembles a human upper body; and extracting, as the candidate area, a portion of the smoothest region that resembles a human head; using one or more hierarchical verification levels to verify whether a human face is in the candidate area; indicating that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area; and using a plurality of cues to track each verified face in the content from frame to frame.
-
Specification