System and method for gesture recognition in three dimensions using stereo imaging and color vision
First Claim
Patent Images
1. A method for recognizing gestures comprising:
- obtaining an image data;
determining a hand pose estimation based on computing a center of the hand, computing an orientation of the hand in relation to a camera reference frame, performing background subtraction, determining an arm orientation, and computing the hand pose estimation based on the arm orientation;
producing a frontal view of a hand;
isolating the hand from the background; and
classifying a gesture of the hand;
wherein computing a center of the hand includes defining a hand region as a cylinder centered along the 3D line with dimensions large enough to include a typical hand, selecting pixels from within the hand region as hand pixels, and averaging the location of all of the hand pixels.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for recognizing gestures. The method comprises obtaining image data and determining a hand pose estimation. A frontal view of a hand is then produced. The hand is then isolated the background. The resulting image is then classified as a type of gesture. In one embodiment, determining a hand pose estimation comprises performing background subtraction and computing a hand pose estimation based on an arm orientation determination. In another embodiment, a frontal view of a hand is then produced by performing perspective unwarping and scaling. The system that implements the method may be a personal computer with a stereo camera coupled thereto.
-
Citations
29 Claims
-
1. A method for recognizing gestures comprising:
-
obtaining an image data;
determining a hand pose estimation based on computing a center of the hand, computing an orientation of the hand in relation to a camera reference frame, performing background subtraction, determining an arm orientation, and computing the hand pose estimation based on the arm orientation;
producing a frontal view of a hand;
isolating the hand from the background; and
classifying a gesture of the hand;
wherein computing a center of the hand includes defining a hand region as a cylinder centered along the 3D line with dimensions large enough to include a typical hand, selecting pixels from within the hand region as hand pixels, and averaging the location of all of the hand pixels. - View Dependent Claims (2, 7, 8, 9)
defining a hand reference frame with an x component a y component and a z component such that the x component is aligned with the 3D line, the y component is perpendicular to the x component, the y component is parallel to the viewing plane, and the z component is perpendicular to the x component and the y component.
-
-
7. The method of claim 1 wherein isolating the hand comprises:
-
initializing a hand color probability density function; and
refining the hand color probability density function.
-
-
8. The method of claim 7 wherein initializing comprises:
using the hand pixels to initialize and evaluate the hue-saturation histogram of the hand color.
-
9. The method of claim 8 wherein refining comprises:
-
choosing a part of a color space that contains a majority of the hand pixels to define a hand color;
selecting those pixels in the image surrounding the hand that are of a color corresponding to the hand color;
discarding the hand pixels which are not of the color corresponding to the hand color.
-
-
3. A method for recognizing gestures comprising:
-
obtaining an image data;
determining a hand pose estimation;
producing a frontal view of a hand based on performing perspective unwarping to produce an unwarped frontal view of the hand and scaling the unwarped frontal view of the hand into a template image;
isolating the hand from the background; and
classifying a gesture of the hand. - View Dependent Claims (4, 5, 6, 10, 22, 23, 24, 29)
mathematically moving a virtual camera to a canonical location with respect to a hand reference frame.
-
-
5. The method of claim 4 wherein mathematically moving the virtual camera comprises:
-
rotating the virtual camera to align a reference frame of the virtual camera with a reference frame of the hand;
translating the virtual camera to a fixed distance from the orientation of the hands.
-
-
6. The method of claim 3 wherein scaling comprises:
choosing a fixed correspondence between the dimensions of the template image and the dimensions of a typical hand.
-
10. The method of claim 3 wherein classifying a gesture comprises:
matching the hand template against a plurality of gesture templates.
-
22. The method of claim 3 wherein the image data comprises:
a color image and a depth data.
-
23. The method of claim 22 wherein the color image comprises a red value, a green value and a blue value for each pixel of a captured image, and the depth data comprises an x value in a camera reference frame, a y value in the camera reference frame, and a z value in the camera reference frame for each pixel of the captured image.
-
24. The method of claim 3 wherein determining a hand pose estimation comprises:
-
performing background subtraction;
determining an arm orientation; and
computing the hand pose estimation based on the arm orientation.
-
-
29. The method of claim 24 wherein computing the hand pose estimation comprises:
-
computing a center of the hand; and
computing an orientation of the hand in relation to a camera reference frame.
-
-
11. A method for recognizing gestures comprising:
-
obtaining an image data;
determining a hand pose estimation;
producing a frontal view of a hand;
isolating the hand from the background;
classifying a gesture of the hand; and
matching the hand template against a plurality of gesture templates based on computing geometric moments of a first order, a second order and a third order; and
applying a Mahalanobis distance metric.
-
-
12. A method for recognizing gestures comprising:
-
obtaining an image data;
performing background subtraction;
computing a hand pose estimation based on an arm orientation determination;
performing perspective unwarping to produce an unwarped frontal view of a hand;
scaling the unwarped frontal view of the hand into a template image;
isolating the hand from the background using color segmentation; and
classifying a gesture of the hand by matching the hand with a plurality of template hand images. - View Dependent Claims (13, 14, 15, 16)
a color image comprised of a red value, a green value and a blue value for each pixel of a captured image, and a depth data comprised of an x value in a camera reference frame, a y value in the camera reference frame, and a z value in the camera reference frame for each pixel of the captured image.
-
-
14. The method of claim 13 wherein performing background subtraction comprises:
selecting as a foreground arm image those pixels of the depth data where the difference between a mean background depth and the current depth is larger than an empirically defined threshold.
-
15. The method of claim 14 wherein determining an arm orientation comprises:
computing a three-dimensional (3D) line that defines the arm orientation by fitting a first two dimensional (2D) line to the image data in the image plane and fitting a second 2D line to the image data in the depth dimension in the plane containing the first 2D line such that the second 2D line is perpendicular to the viewing plane.
-
16. The method of claim 15 wherein computing the hand pose estimation comprises:
-
computing a center of the hand; and
computing an orientation of the hand in relation to the camera reference frame.
-
-
17. A system comprising:
-
a stereo camera coupled to a computer, the computer comprising a processor and a storage device to read from a machine readable medium, the machine readable medium containing instructions which, when executed by the processor, cause the computer to perform operations comprising;
obtaining an image data;
performing background subtraction;
computing a hand pose estimation based on an arm orientation determination;
performing perspective unwarping to produce an unwarped frontal view of a hand;
scaling the unwarped frontal view of the hand into a template image;
isolating the hand from the background using color segmentation; and
classifying a gesture of the hand by matching the hand with a plurality of template hand images. - View Dependent Claims (18, 19, 20, 21)
a color image comprised of a red value, a green value and a blue value for each pixel of a captured image, and a depth data comprised of an x value in a camera reference frame, a y value in the camera reference frame, and a z value in the camera reference frame for each pixel of the captured image.
-
-
19. The system of claim 17 wherein performing background subtraction comprises:
selecting as a foreground arm image those pixels of the depth data where the difference between a mean background depth and the current depth is larger than an empirically defined threshold.
-
20. The system of claim 17 wherein determining an arm orientation comprises:
computing a three-dimensional (3D) line that defines the arm orientation by fitting a first two dimensional (2D) line to the image data in the image plane and fitting a second 2D line to the image data in the depth dimension in the plane containing the first 2D line such that the second 2D line is perpendicular to the viewing plane.
-
21. The system of claim 17 wherein computing the hand pose estimation comprises:
-
computing a center of the hand; and
computing an orientation of the hand in relation to a camera reference frame.
-
-
25. A method for recognizing gestures comprising:
-
obtaining an image data;
determining a hand pose estimation based on performing background subtraction, determining an arm orientation, and computing the hand pose estimation based on the arm orientation, wherein performing background subtraction includes selecting as a foreground arm image those pixels where the difference between a mean background depth and the current depth is larger than an empirically defined threshold;
producing a frontal view of a hand;
isolating the hand from the background; and
classifying a gesture of the hand.
-
-
26. A method for recognizing gestures comprising
obtaining an image data; -
determining a hand pose estimation based on performing background subtraction, determining an arm orientation, and computing the hand pose estimation based on the arm orientation, wherein determining an arm orientation includes fitting a first two dimensional (2D) line to the image data in the image plane;
fitting a second 2D line to the image data in the depth dimension in the plane containing the first 2D line such that the second 2D line is perpendicular to the viewing plane, and combining the first 2D line and the second 2D line into a three-dimensional (3D) line such that the 3D line defines the arm orientation;
producing a frontal view of a hand;
isolating the hand from the background; and
classifying a gesture of the hand. - View Dependent Claims (27, 28)
employing an iterative reweighted least square method; and
wherein fitting a second 2D line comprises;
employing an iterative reweighted least square method.
-
-
28. The method of claim 27 wherein employing an iterative reweighted least square method comprises:
using a Welsch M-estimator.
Specification