System and method for eye-tracking and blink detection
First Claim
Patent Images
1. A process for tracking eyes and detecting eye blinks, comprising the process actions of:
- using a computing device for,defining eye templates for a person depicted in a video frame;
inputting a video frame of the person'"'"'s face;
using a face detector to find a face box which surrounds the face of the person;
searching the upper part of the face box for eyes using feature based matching to match image patches for each eye in the video frame to the eye templates to locate the eyes, wherein the feature based matching comprises the process actions of;
for each image patch,computing a grayscale image corresponding to the image patch;
creating horizontal and vertical edge maps of the image patch;
summing columns of pixels for the grayscale image and the vertical edge map to project the image patch to the horizontal axis to create two one dimensional (1D) signals;
summing rows of pixels for the horizontal edge map to project the image patch to the vertical axis to produce one 1D signal; and
computing the similarity between the eye template and the image patch as a weighted sum of the correlations between corresponding 1D signals, wherein the similarity is determined by a signal correlation function S(A,B) of where L is the length of the signal and signals A and B are two arrays, A=a1, a2, . . . , aL, B=b1, b2, . . . , bL where ai and bi are elements in the two arrays.
2 Assignments
0 Petitions
Accused Products
Abstract
A real-time low frame-rate video compression system and method that allows the user to perform face-to-face communication through an extremely low bandwidth network. The system and method employs novel eye tracking and blink detection techniques in order to select images for transmission. Experimental results show that the system is superior to more traditional video codecs for low bit-rate face-to-face communication.
-
Citations
12 Claims
-
1. A process for tracking eyes and detecting eye blinks, comprising the process actions of:
-
using a computing device for, defining eye templates for a person depicted in a video frame; inputting a video frame of the person'"'"'s face; using a face detector to find a face box which surrounds the face of the person; searching the upper part of the face box for eyes using feature based matching to match image patches for each eye in the video frame to the eye templates to locate the eyes, wherein the feature based matching comprises the process actions of; for each image patch, computing a grayscale image corresponding to the image patch; creating horizontal and vertical edge maps of the image patch; summing columns of pixels for the grayscale image and the vertical edge map to project the image patch to the horizontal axis to create two one dimensional (1D) signals; summing rows of pixels for the horizontal edge map to project the image patch to the vertical axis to produce one 1D signal; and computing the similarity between the eye template and the image patch as a weighted sum of the correlations between corresponding 1D signals, wherein the similarity is determined by a signal correlation function S(A,B) of where L is the length of the signal and signals A and B are two arrays, A=a1, a2, . . . , aL, B=b1, b2, . . . , bL where ai and bi are elements in the two arrays. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
where wG, wH and wv are predefined weights.
-
-
3. The process of claim 1 wherein the three 1D signals for the right eye template TR are denoted as GiT
L ,i=1, . . . , XR for the grayscale image, HiTL ,i=1, . . . , YR for the horizontal edge map, and ViTL ,i=1, . . . , XR for the vertical edge map, and where XR and YR are the width and height of the template, for a candidate image patch I, three corresponding signals are denoted as GiI, HiI and ViI and the correlation S(TR,I) between the two image patches is computed as
S(TR,I)=wG·- S(GT
R ,GI)+wH·
S(HTR ,HI)+wV·
S(VTR ,VI)where wG, wH and wv are predefined weights.
- S(GT
-
4. The process of claim 1 wherein defining an eye template comprises the process actions of:
-
manually indicating the pupil positions on the first frame of the video sequence in which the person is depicted with wide open eyes; and extracting two image patches at the pupil positions as templates, one for each eye.
-
-
5. The process of claim 1 wherein said face detector only scans in the neighborhood of the face location in a previous frame to find the face box.
-
6. The process of claim 1 wherein blinking is detected when the correlation values drop significantly.
-
7. The process of claim 3 wherein eye closing is detected when the correlation values for a given frame drop below a given threshold.
-
8. The process of claim 7 wherein the threshold is set to be 0.6.
-
9. A computer-readable storage medium having computer-executable instructions stored thereon for performing the process recited in claim 1.
-
10. A system for detecting eye blinks, comprising:
-
a general purpose computing device; and a computer program comprising program modules executable by the computing device, wherein the computing device is directed by the program modules of the computer program to, define eye templates for a person depicted in a video frame; input a sequence of video frames at least some of which containing image frames of the person; use a face detector to find a face; if a face is found, search the face for eyes using said eye templates and feature based matching, wherein image patches for each frame in the video sequence are extracted and compared to the eye templates wherein the feature based matching comprises the sub-modules to; for each extracted image patch, compute a grayscale image corresponding to the image patch; create horizontal and vertical edge maps of the image patch; sum columns of pixels of the grayscale image and the vertical edge map to project the image patch to the horizontal axis to create two one dimensional (1D) signals; sum rows of pixels for the horizontal edge map to project the image patch to the vertical axis to produce one 1D signal; and compute the similarity between the eye template and the image patch as a weighted sum of the correlations between corresponding 1D signals, wherein the similarity is determined by a signal correlation function S(A,B) of where L is the length of the signal and signals A and B are two arrays, A=a1, a2, . . . , aL, B=b1, b2, . . . , bL where ai and bi are elements in the two arrays. - View Dependent Claims (11, 12)
-
Specification