Three-dimensional hand tracking using depth sequences

US 9,811,721 B2
Filed: 05/07/2015
Issued: 11/07/2017
Est. Priority Date: 08/15/2014
Status: Active Grant

First Claim

Patent Images

1. An apparatus, comprising:

a depth-sensing camera;

a memory having, stored therein, computer program code; and

one or more processing units operatively coupled to the memory and configured to execute instructions in the computer program code that cause the one or more processing units to;

receive a depth map of a scene containing one or more human hands from the depth-sensing camera, the depth map comprising a matrix of pixels, each pixel having a depth value;

extract, from the depth map, features based on the depth values of the pixels in a plurality of patches distributed in respective positions over the one or more human hands, wherein the depth values of the pixels are normalized, such that the features are background-invariant;

match the extracted features to previously-stored features;

estimate a position of at least one of the one or more human hands based, at least in part, on stored information associated with the matched features; and

track the position of the at least one of the one or more human hands,wherein the instructions to track comprise instructions to track bi-directionally along a z-axis of the scene.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In the field of Human-computer interaction (HCI), i.e., the study of the interfaces between people (i.e., users) and computers, understanding the intentions and desires of how the user wishes to interact with the computer is a very important problem. The ability to understand human gestures, and, in particular, hand gestures, as they relate to HCI, is a very important aspect in understanding the intentions and desires of the user in a wide variety of applications. In this disclosure, a novel system and method for three-dimensional hand tracking using depth sequences is described. Some of the major contributions of the hand tracking system described herein include: 1.) a robust hand detector that is invariant to scene background changes; 2.) a bi-directional tracking algorithm that prevents detected hands from always drifting closer to the front of the scene (i.e., forward along the z-axis of the scene); and 3.) various hand verification heuristics.

38 Citations

View as Search Results

20 Claims

1. An apparatus, comprising:
- a depth-sensing camera;
  
  a memory having, stored therein, computer program code; and
  
  one or more processing units operatively coupled to the memory and configured to execute instructions in the computer program code that cause the one or more processing units to;
  
  receive a depth map of a scene containing one or more human hands from the depth-sensing camera, the depth map comprising a matrix of pixels, each pixel having a depth value;
  
  extract, from the depth map, features based on the depth values of the pixels in a plurality of patches distributed in respective positions over the one or more human hands, wherein the depth values of the pixels are normalized, such that the features are background-invariant;
  
  match the extracted features to previously-stored features;
  
  estimate a position of at least one of the one or more human hands based, at least in part, on stored information associated with the matched features; and
  
  track the position of the at least one of the one or more human hands,wherein the instructions to track comprise instructions to track bi-directionally along a z-axis of the scene.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The apparatus of claim 1, wherein the features use a constant value for the depth values of background pixels.
  - 3. The apparatus of claim 1, wherein the instructions to match the extracted features to previously-stored features further comprise instructions to use a background-invariant decision forest.
  - 4. The apparatus of claim 1, wherein the instructions to estimate the position of the at least one of the one or more human hands further comprise instructions to disregard pixels that do not exhibit a threshold amount of motion.
  - 5. The apparatus of claim 1, wherein the instructions to estimate the position of the at least one of the one or more human hands further comprise instructions to disregard hands that do not exhibit single-directional connectivity to a human body.
  - 6. The apparatus of claim 1, wherein the instructions to track bi-directionally along a z-axis of the scene further comprise instructions to locate local extrema in the depth map of the scene.
  - 7. The apparatus of claim 1, wherein the instructions to track bi-directionally along a z-axis of the scene further comprise instructions to weight the x-coordinate value and y-value coordinate of the pixels in the plurality of patches.

8. A non-transitory program storage device, readable by a programmable control device and comprising instructions stored thereon to cause one or more processing units to:
- receive a depth map of a scene containing one or more human hands from a depth-sensing camera, the depth map comprising a matrix of pixels, each pixel having a depth value;
  
  extract, from the depth map, features based on the depth values of the pixels in a plurality of patches distributed in respective positions over the one or more human hands, wherein the depth values of the pixels are normalized, such that the features are background-invariant;
  
  match the extracted features to previously-stored features;
  
  estimate a position of at least one of the one or more human hands based, at least in part, on stored information associated with the matched features; and
  
  track the position of the at least one of the one or more human hands,wherein the instructions to track comprise instructions to track bi-directionally along a z-axis of the scene.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The non-transitory program storage device of claim 8, wherein the features use a constant value for the depth values of background pixels.
  - 10. The non-transitory program storage device of claim 8, wherein the instructions to match the extracted features to previously-stored features further comprise instructions to use a background-invariant decision forest.
  - 11. The non-transitory program storage device of claim 8, wherein the instructions to estimate the position of the at least one of the one or more human hands further comprise instructions to disregard pixels that do not exhibit a threshold amount of motion.
  - 12. The non-transitory program storage device of claim 8, wherein the instructions to estimate the position of the at least one of the one or more human hands further comprise instructions to disregard hands that do not exhibit single-directional connectivity to a human body.
  - 13. The non-transitory program storage device of claim 8, wherein the instructions to track bi-directionally along a z-axis of the scene further comprise instructions to locate local extrema in the depth map of the scene.
  - 14. The non-transitory program storage device of claim 8, wherein the instructions to track bi-directionally along a z-axis of the scene further comprise instructions to weight the x-coordinate value and y-value coordinate of the pixels in the plurality of patches.

15. A computer-implemented method, comprising:
- receiving a depth map of a scene containing one or more human hands from a depth-sensing camera, the depth map comprising a matrix of pixels, each pixel having a depth value;
  
  extracting, from the depth map, features based on the depth values of the pixels in a plurality of patches distributed in respective positions over the one or more human hands, wherein the depth values of the pixels are normalized, such that the descriptors are background-invariant;
  
  matching the extracted features to previously-stored features;
  
  estimating a position of at least one of the one or more human hands based, at least in part, on stored information associated with the matched features; and
  
  tracking the position of the at least one of the one or more human hands,wherein the instructions to track comprise instructions to track bi-directionally along a z-axis of the scene.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-implemented method of claim 15, wherein the features use a constant value for the depth values of background pixels.
  - 17. The computer-implemented method of claim 15, wherein estimating the position of the at least one of the one or more human hands further comprises disregarding background pixels and pixels that do not exhibit a threshold amount of motion.
  - 18. The computer-implemented method of claim 15, wherein estimating the position of the at least one of the one or more human hands further comprises disregarding hands that do not exhibit single-directional connectivity to a human body.
  - 19. The computer-implemented method of claim 15, wherein tracking bi-directionally along a z-axis of the scene further comprises locating local extrema in the depth map of the scene.
  - 20. The computer-implemented method of claim 15, wherein tracking bi-directionally along a z-axis of the scene further comprises weighting the x-coordinate value and y-value coordinate of the pixels in the plurality of patches.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Tang, Feng, Li, Ang, Shi, Xiaojin
Primary Examiner(s)
Akhavannik, Hadi

Application Number

US14/706,649
Publication Number

US 20160048726A1
Time in Patent Office

915 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 3/017   Gesture based interaction, ...

G06F 3/0304   Detection arrangements usin...

G06F 3/0425   using a single imaging devi...

G06T 2200/04   involving 3D image data

G06T 2207/30196   Human being; Person

G06T 7/246   using feature-based methods...

G06T 7/254   involving subtraction of im...

G06V 40/28   Recognition of hand or arm ...

H04N 13/207   using a single 2D image sensor

H04N 13/271   wherein the generated image...

H04N 2013/0085   Motion estimation from ster...

Three-dimensional hand tracking using depth sequences

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

38 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Three-dimensional hand tracking using depth sequences

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others