Three-dimensional hand tracking using depth sequences
First Claim
1. An apparatus, comprising:
- a depth-sensing camera;
a memory having, stored therein, computer program code; and
one or more processing units operatively coupled to the memory and configured to execute instructions in the computer program code that cause the one or more processing units to;
receive a depth map of a scene containing one or more human hands from the depth-sensing camera, the depth map comprising a matrix of pixels, each pixel having a depth value;
extract, from the depth map, features based on the depth values of the pixels in a plurality of patches distributed in respective positions over the one or more human hands, wherein the depth values of the pixels are normalized, such that the features are background-invariant;
match the extracted features to previously-stored features;
estimate a position of at least one of the one or more human hands based, at least in part, on stored information associated with the matched features; and
track the position of the at least one of the one or more human hands,wherein the instructions to track comprise instructions to track bi-directionally along a z-axis of the scene.
1 Assignment
0 Petitions
Accused Products
Abstract
In the field of Human-computer interaction (HCI), i.e., the study of the interfaces between people (i.e., users) and computers, understanding the intentions and desires of how the user wishes to interact with the computer is a very important problem. The ability to understand human gestures, and, in particular, hand gestures, as they relate to HCI, is a very important aspect in understanding the intentions and desires of the user in a wide variety of applications. In this disclosure, a novel system and method for three-dimensional hand tracking using depth sequences is described. Some of the major contributions of the hand tracking system described herein include: 1.) a robust hand detector that is invariant to scene background changes; 2.) a bi-directional tracking algorithm that prevents detected hands from always drifting closer to the front of the scene (i.e., forward along the z-axis of the scene); and 3.) various hand verification heuristics.
38 Citations
20 Claims
-
1. An apparatus, comprising:
-
a depth-sensing camera; a memory having, stored therein, computer program code; and one or more processing units operatively coupled to the memory and configured to execute instructions in the computer program code that cause the one or more processing units to; receive a depth map of a scene containing one or more human hands from the depth-sensing camera, the depth map comprising a matrix of pixels, each pixel having a depth value; extract, from the depth map, features based on the depth values of the pixels in a plurality of patches distributed in respective positions over the one or more human hands, wherein the depth values of the pixels are normalized, such that the features are background-invariant; match the extracted features to previously-stored features; estimate a position of at least one of the one or more human hands based, at least in part, on stored information associated with the matched features; and track the position of the at least one of the one or more human hands, wherein the instructions to track comprise instructions to track bi-directionally along a z-axis of the scene. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory program storage device, readable by a programmable control device and comprising instructions stored thereon to cause one or more processing units to:
-
receive a depth map of a scene containing one or more human hands from a depth-sensing camera, the depth map comprising a matrix of pixels, each pixel having a depth value; extract, from the depth map, features based on the depth values of the pixels in a plurality of patches distributed in respective positions over the one or more human hands, wherein the depth values of the pixels are normalized, such that the features are background-invariant; match the extracted features to previously-stored features; estimate a position of at least one of the one or more human hands based, at least in part, on stored information associated with the matched features; and track the position of the at least one of the one or more human hands, wherein the instructions to track comprise instructions to track bi-directionally along a z-axis of the scene. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented method, comprising:
-
receiving a depth map of a scene containing one or more human hands from a depth-sensing camera, the depth map comprising a matrix of pixels, each pixel having a depth value; extracting, from the depth map, features based on the depth values of the pixels in a plurality of patches distributed in respective positions over the one or more human hands, wherein the depth values of the pixels are normalized, such that the descriptors are background-invariant; matching the extracted features to previously-stored features; estimating a position of at least one of the one or more human hands based, at least in part, on stored information associated with the matched features; and tracking the position of the at least one of the one or more human hands, wherein the instructions to track comprise instructions to track bi-directionally along a z-axis of the scene. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification