Systems and methods for initializing motion tracking of human hands using template matching within bounded regions determined using a depth map
First Claim
1. A real-time gesture based interactive system, comprising:
- a processor;
a reference camera configured to capture sequences of frames of video data, where each frame of video data comprises intensity information for a plurality of pixels;
an alternate view camera configured to capture sequences of frames of video data, where each frame of video data comprises intensity information for a plurality of pixels; and
memory containing a hand tracking application; and
wherein the hand tracking application configures the processor to;
obtain a reference frame of video data from the reference camera where the reference frame of video data is part of a sequence of frames of video data obtained from the reference camera;
obtain an alternate view frame of video data from the alternate view camera;
identify moving pixels by comparing a previous frame of video data from the sequence of frames of video data with the reference frame of video data to identify pixel value differences exceeding a predetermined threshold;
generate a depth map containing distances from the reference camera for pixels in the reference frame of video data using information including the disparity between corresponding pixels within the reference and alternate view frames of video data;
identify at least one bounded region within the reference frame of video data containing moving pixels having distances from the reference camera that are within a specific range of distances from the reference camera by;
identifying at least one preliminary bounded region within the reference frame of video data containing pixels that are moving,generating the depth map based upon the identified at least one preliminary bounded region in the reference frame of video data so that the depth map contains distances from the reference camera for pixels within the at least one preliminary bounded region in the reference frame of video data, andidentifying the at least one bounded region within the at least one preliminary bounded region in the reference frame of video data using the depth map;
determine whether any of the pixels within the at least one bounded region within the reference frame are part of a human hand;
obtain the sequence of frames of video data from the reference camera;
track the motion of the part of the human hand visible in the sequence of frames of video data;
confirm that the tracked motion of the part of the human hand visible in the sequence of frames of video data corresponds to a predetermined initialization gesture; and
commence tracking the human hand as part of a gesture based interactive session.
6 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for initializing motion tracking of human hands within bounded regions are disclosed. One embodiment includes: a processor; reference and alternate view cameras; and memory containing a plurality of templates that are rotated and scaled versions of a base template. In addition, a hand tracking application configures the processor to: obtain reference and alternate view frames of video data; generate a depth map; identify at least one bounded region within the reference frame of video data containing pixels having distances from the reference camera that are within a specific range of distances; determine whether any of the pixels within the at least one bounded region are part of a human hand; track the motion of the part of the human hand in a sequence of frames of video data obtained from the reference camera; and confirm that the tracked motion corresponds to a predetermined initialization gesture.
297 Citations
28 Claims
-
1. A real-time gesture based interactive system, comprising:
-
a processor; a reference camera configured to capture sequences of frames of video data, where each frame of video data comprises intensity information for a plurality of pixels; an alternate view camera configured to capture sequences of frames of video data, where each frame of video data comprises intensity information for a plurality of pixels; and memory containing a hand tracking application; and wherein the hand tracking application configures the processor to; obtain a reference frame of video data from the reference camera where the reference frame of video data is part of a sequence of frames of video data obtained from the reference camera; obtain an alternate view frame of video data from the alternate view camera; identify moving pixels by comparing a previous frame of video data from the sequence of frames of video data with the reference frame of video data to identify pixel value differences exceeding a predetermined threshold; generate a depth map containing distances from the reference camera for pixels in the reference frame of video data using information including the disparity between corresponding pixels within the reference and alternate view frames of video data; identify at least one bounded region within the reference frame of video data containing moving pixels having distances from the reference camera that are within a specific range of distances from the reference camera by; identifying at least one preliminary bounded region within the reference frame of video data containing pixels that are moving, generating the depth map based upon the identified at least one preliminary bounded region in the reference frame of video data so that the depth map contains distances from the reference camera for pixels within the at least one preliminary bounded region in the reference frame of video data, and identifying the at least one bounded region within the at least one preliminary bounded region in the reference frame of video data using the depth map; determine whether any of the pixels within the at least one bounded region within the reference frame are part of a human hand; obtain the sequence of frames of video data from the reference camera; track the motion of the part of the human hand visible in the sequence of frames of video data; confirm that the tracked motion of the part of the human hand visible in the sequence of frames of video data corresponds to a predetermined initialization gesture; and commence tracking the human hand as part of a gesture based interactive session. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A real-time gesture based interactive system comprising:
-
a processor; a reference camera configured to capture sequences of frames of video data, where each frame of video data comprises intensity information for a plurality of pixels; an alternate view camera configured to capture sequences of frames of video data, where each frame of video data comprises intensity information for a plurality of pixels; and memory containing a hand tracking application and a plurality of templates that are rotated and scaled versions of a base template; and wherein the hand tracking application configures the processor to; obtain a reference frame of video data from the reference camera; obtain an alternate view frame of video data from the alternate view camera; generate a depth map containing distances from the reference camera for pixels in the reference frame of video data using information including the disparity between corresponding pixels within the reference and alternate view frames of video data; and identify at least one bounded region within the reference frame of video data containing pixels having distances from the reference camera that are within a specific range of distances from the reference camera; determine whether any of the pixels within the at least one bounded region within the reference frame are part of a human hand by searching the frame of video data for a grouping of pixels that match one of the plurality of templates; obtain a sequence of frames of video data from the reference camera; track the motion of the part of the human hand visible in the sequence of frames of video data; confirm that the tracked motion of the part of the human hand visible in the sequence of frames of video data corresponds to a predetermined initialization gesture; and commence tracking the human hand as part of a gesture based interactive session. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
-
-
27. A real-time gesture based interactive system, comprising:
-
a processor; a reference camera configured to capture sequences of frames of video data, where each frame of video data comprises color information for a plurality of pixels; an alternate view camera configured to capture sequences of frames of video data, where each frame of video data comprises color information for a plurality of pixels; and memory containing; a hand tracking application; and a set of edge feature templates comprising a plurality of edge feature templates that are rotated and scaled versions of a base template; wherein the hand tracking application configures the processor to; obtain a reference frame of video data from the reference camera; obtain an alternate view frame of video data from the alternate view camera; generate a depth map containing distances from the reference camera for pixels in the reference frame of video data using information including the disparity between corresponding pixels within the reference and alternate view frames of video data; identify at least one bounded region within the reference frame of video data containing pixels having distances from the reference camera that are within a specific range of distances from the reference camera; determine whether any of the pixels within the at least one bounded region within the reference frame are part of a human hand visible in the sequence of frames of video data, where a part of a human hand is identified by searching the frame of video data for a grouping of pixels that have image gradient orientations that match the edge features of one of the plurality of edge feature templates; obtain a sequence of frames of video data from the reference camera; track the motion of the part of the human hand visible in the sequence of frames of video data; confirm that the tracked motion of the part of the human hand visible in the sequence of frames of video data corresponds to a predetermined initialization gesture, where the predetermined initialization gesture comprises a finger oscillating from side to side within a predetermined gesture range; initialize the image capture settings of the reference camera used during the gesture based interactive session by adjusting the exposure and gain of the reference camera as additional frames of video data are captured by the reference camera so that the brightness of at least one pixel that is part of a human hand visible in the additional frames of video data satisfies a predetermined criterion; and commence tracking the human hand as part of a gesture based interactive session.
-
-
28. A method of commencing tracking of a human hand using a real-time gesture based interactive system, comprising:
-
capturing a reference frame of video data using a reference camera, where the reference frame of video data comprises intensity information for a plurality of pixels; capturing an alternate view frame of video data using an alternate view camera, where the alternate view frame of video data comprises intensity information for a plurality of pixels; generating a depth map containing distances from the reference camera for pixels in the reference frame of video data using a processor configured by a hand tracking application and information including the disparity between corresponding pixels within the reference and alternate view frames of video data; identifying at least one bounded region within the reference frame of video data containing pixels having distances from the reference camera that are within a specific range of distances from the reference camera using the processor configured by the hand tracking application; determining whether any of the pixels within the at least one bounded region within the reference frame are part of a human hand visible in the reference frame of video data using the processor configured using the hand tracking application, where a part of a human hand is identified by searching the reference frame of video data for a grouping of pixels that have image gradient orientations that match the edge features of one of the plurality of edge feature templates; obtaining a sequence of frames of video data from the reference camera; tracking the motion of the part of the human hand visible in the sequence of frames of video data using the processor configured using the hand tracking application; confirming that the tracked motion of the part of the human hand visible in the sequence of frames of video data corresponds to a predetermined initialization gesture using the processor configured using the hand tracking application; and commence tracking the human hand as part of a gesture based interactive session using the processor configured using the hand tracking application.
-
Specification