Performing hand gesture recognition using 2D image data
First Claim
Patent Images
1. An apparatus to recognize hand gestures, comprising:
- an offline module to determine a skin tone distribution for a plurality of pixels in a video signal, wherein the offline module includes an edge detection unit to receive a color image associated with a frame of the video signal and conduct an edge analysis on the color image for each of a plurality of channels and wherein the edge detection unit includes;
box logic to, for each channel in the plurality of channels, determine a set of Gaussian derivatives;
convolution logic to perform a convolution between the set of Gaussian derivatives and each pixel in the color image to obtain a gradient magnitude and a gradient angle for each pixel in the color image on a per channel basis; and
threshold logic to use a low threshold and a high threshold to determine whether each gradient magnitude and associated gradient angle corresponds to an edge, wherein the low threshold and the high threshold are channel-specific; and
an online module to use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods may provide for determining a skin tone distribution for a plurality of pixels in a video signal and using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal. In one example, the video signal includes two-dimensional (2D) image data, and the skin tone distribution has an execution time budget that is greater than an execution time budget of the blob-based hand gesture determinations.
-
Citations
78 Claims
-
1. An apparatus to recognize hand gestures, comprising:
-
an offline module to determine a skin tone distribution for a plurality of pixels in a video signal, wherein the offline module includes an edge detection unit to receive a color image associated with a frame of the video signal and conduct an edge analysis on the color image for each of a plurality of channels and wherein the edge detection unit includes; box logic to, for each channel in the plurality of channels, determine a set of Gaussian derivatives; convolution logic to perform a convolution between the set of Gaussian derivatives and each pixel in the color image to obtain a gradient magnitude and a gradient angle for each pixel in the color image on a per channel basis; and threshold logic to use a low threshold and a high threshold to determine whether each gradient magnitude and associated gradient angle corresponds to an edge, wherein the low threshold and the high threshold are channel-specific; and an online module to use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus to recognize hand gestures, comprising:
-
an offline module to determine a skin tone distribution for a plurality of pixels in a video signal, wherein the offline module includes an edge detection unit to receive a color image associated with a frame of the video signal and conduct an edge analysis on the color image for each of a plurality of channels and further includes a distance unit to identify an edge map associated with the edge analysis and iteratively propagate nearest neighbor information between pixels in the edge map to obtain a distance map; and an online module to use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal, wherein the distance unit includes; first initialization logic to initialize edge pixels in the edge map as being their own nearest edges and having an edge distance of zero, add the initialized edge pixels to a first queue, and designate the first queue as an active queue; second initialization logic to initialize non-edge pixels in the edge map as having unknown nearest edges and an edge distance of infinity and designate a second queue as an inactive queue; comparison logic to, for each pixel in the active queue, conduct a distance determination as to whether a first distance between a neighboring pixel and a nearest edge of the pixel in the active queue is less than or equal to a second distance between the neighboring pixel and a current nearest edge of the neighboring pixel; broadcast logic to conduct a transfer of a state of the pixel in the active queue to a state of the neighboring pixel if the first distance is less than or equal to the second distance, and replace the second distance in the state of the neighboring pixel with the first distance; queue logic to conduct a removal the pixel in the active queue from the active queue and an addition of the neighboring pixel to the inactive queue if the first distance is less than or equal to the second distance; first iteration logic to repeat a first invocation of the comparison logic, the broadcast logic and the queue logic for each neighboring pixel of the pixel in the active queue; and second iteration logic to conduct a first designation of the first queue as the inactive queue, a second designation of the second queue as the active queue, and repeat a subsequent invocation of the comparison logic, the broadcast logic, the queue logic and the first iteration logic until the active queue is empty.
-
-
13. An apparatus to recognize hand gestures, comprising:
-
an offline module to determine a skin tone distribution for a plurality of pixels in a video signal, wherein the offline module includes an edge detection unit to receive a color image associated with a frame of the video signal and conduct an edge analysis on the color image for each of a plurality of channels and further includes a distance unit to identify an edge map associated with the edge analysis and iteratively propagate nearest neighbor information between pixels in the edge map to obtain a distance map and further includes a fingertip unit to identify a set of contour line pixels that surround a plurality of fingertips in the color image based on the edge map and the distance map; and an online module to use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal, wherein the fingertip unit includes; local logic to use a set of finger segment curves to identify a plurality of local edge distance minima corresponding to the plurality of fingertips, wherein the plurality of fingertips includes one or more of an index fingertip, a middle fingertip, a ring fingertip, or a pinky fingertip; and global logic to use the set of finger segment curves to identify four global edge distance minima for contour line pixels associated with each local edge distance minimum and with each of the plurality of fingertips. - View Dependent Claims (14)
-
-
15. An apparatus to recognize hand gestures, comprising:
-
an offline module to determine a skin tone distribution for a plurality of pixels in a video signal, wherein the offline module includes an edge detection unit to receive a color image associated with a frame of the video signal and conduct an edge analysis on the color image for each of a plurality of channels and further includes a distance unit to identify an edge map associated with the edge analysis and iteratively propagate nearest neighbor information between pixels in the edge map to obtain a distance map and further includes a fingertip unit to identify a set of contour line pixels that surround a plurality of fingertips in the color image based on the edge map and the distance map; and an online module to use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal, wherein the skin tone distribution is to be determined based on color values for pixels inside the set of contour line pixels.
-
-
16. An apparatus to recognize hand gestures, comprising:
-
an offline module to determine a skin tone distribution for a plurality of pixels in a video signal, wherein the offline module is to remove non-skin pixels from an input frame associated with the video signal based on the skin tone distribution and sub-sample the input frame to obtain a plurality of modified frames, and wherein the online module includes a feature extraction unit to identify a plurality of blobs in the plurality of modified frames; and an online module to use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal, wherein the feature extraction unit includes; trace logic to determine a Hessian trace function; convolution logic to, for each pixel in a modified frame, perform a convolution between the Hessian trace function and a set of non-adjacent pixels associated with the pixel in the modified frame to obtain a convolution score; scale logic to invoke the convolution logic for a plurality of variance parameter values to obtain a plurality of convolution scores for the pixel in the modified frame; and selection logic to identify a blob corresponding to a highest score in the plurality of convolution scores. - View Dependent Claims (17, 18, 19, 20)
-
-
21. An apparatus to recognize hand gestures, comprising:
-
an offline module to determine a skin tone distribution for a plurality of pixels in a video signal; and an online module to use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal wherein the online module is to remove non-skin pixels from an input frame associated with the video signal based on the skin tone distribution and sub-sample the input frame to obtain a plurality of modified frames, and wherein the online module includes a feature extraction unit to identify a plurality of blobs in the plurality of modified frames and wherein the online module further includes a pose unit to match one or more poses associated with the plurality of blobs to one or more poses stored in a library, wherein the pose unit includes; cluster logic to group the plurality of blobs into a plurality of clusters; descriptor logic to form a density map based on the plurality of clusters; and match logic to use the density map to identify the one or more poses. - View Dependent Claims (22, 23, 24, 25)
-
-
26. An apparatus to recognize hand gestures, comprising:
-
an offline module to determine a skin tone distribution for a plurality of pixels in a video signal; and an online module to use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal wherein the online module is to remove non-skin pixels from an input frame associated with the video signal based on the skin tone distribution and sub-sample the input frame to obtain a plurality of modified frames, and wherein the online module includes a feature extraction unit to identify a plurality of blobs in the plurality of modified frames and wherein the online module further includes a pose unit to match one or more poses associated with the plurality of blobs to one or more poses stored in a library and wherein the online module further includes a temporal recognition unit to identify a plurality of observation trajectories for the one or more poses, maintain scores for the plurality of observation trajectories simultaneously, and use the scores to conduct the one or more blob-based hand gesture determinations, wherein the temporal recognition unit includes; specification logic to identify a set of valid transitions; compliance logic to identify a plurality of observation sequences in training data and remove one or more observation sequences that are non-compliant with the set of valid transitions; Hidden Markov Model (HMM) initialization logic to identify one or more clusters of values associated with compliant observation sequences, take a Cartesian product of the one or more clusters of values and use the Cartesian product to define a plurality of HMM states; and Viterbi logic to determine the scores for the plurality of observation trajectories based on the plurality of HMM states, wherein the blob-based hand gesture determinations are to distinguish between ongoing trajectories, killed trajectories and completed trajectories based on drops in the scores.
-
-
27. A method of recognizing hand gestures, comprising:
-
determining a skin tone distribution for a plurality of pixels in a video signal; using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; receiving a color image associated with a frame of the video signal; conducting an edge analysis on the color image for each of a plurality of channels; determining, for each channel in the plurality of channels, a set of Gaussian derivatives; performing a convolution between the set of Gaussian derivatives and each pixel in the color image to obtain a gradient magnitude and a gradient angle for each pixel in the color image on a per channel basis; and using a low threshold and a high threshold to determine whether each gradient magnitude and associated gradient angle corresponds to an edge, wherein the low threshold and the high threshold are channel-specific. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A method of recognizing hand gestures, comprising:
-
determining a skin tone distribution for a plurality of pixels in a video signal; using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; receiving a color image associated with a frame of the video signal; conducting an edge analysis on the color image for each of a plurality of channels identifying an edge map associated with the edge analysis; iteratively propagating nearest neighbor information between pixels in the edge map to obtain a distance map, further including; initializing edge pixels in the edge map as being their own nearest edges and having an edge distance of zero; adding the initialized edge pixels to a first queue; designating the first queue as an active queue; initializing non-edge pixels in the edge map as having unknown nearest edges and an edge distance of infinity; designating a second queue as an inactive queue; conducting, for each pixel in the active queue, a distance determination as to whether a first distance between a neighboring pixel and a nearest edge of the pixel in the active queue is less than or equal to a second distance between the neighboring pixel and a current nearest edge of the neighboring pixel; conducting a transfer of a state of the pixel in the active queue to a state of the neighboring pixel if the first distance is less than or equal to the second distance; replacing the second distance in the state of the neighboring pixel with the first distance; conducting a removal of the pixel in the active queue from the active queue; conducting an addition of the neighboring pixel to the inactive queue if the first distance is less than or equal the second distance; conducting a first repeat of the distance determination, the transfer of the state and the addition of the neighboring pixel for each neighboring pixel of the pixel in the active queue; conducting a first designation of the first queue as the inactive queue; conducting a second designation of the second queue as the active queue; and conducting a subsequent repeat of the first repeat, the first designation and the second designation until the active queue is empty. - View Dependent Claims (68)
-
-
38. A method of recognizing hand gestures, comprising:
-
determining a skin tone distribution for a plurality of pixels in a video signal; using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; receiving a color image associated with a frame of the video signal; conducting an edge analysis on the color image for each of a plurality of channels identifying an edge map associated with the edge analysis; iteratively propagating nearest neighbor information between pixels in the edge map to obtain a distance map; identifying a set of contour line pixels that surround a plurality of fingertips in the color image based on the edge map and the distance map, further including; using a set of finger segment curves to identify a plurality of local edge distance minima corresponding to the plurality of fingertips, wherein the plurality of fingertips includes one or more of an index fingertip, a middle fingertip, a ring fingertip, or a pinky fingertip; and using the set of finger segment curves to identify four global edge distance minima for contour line pixels associated with each local edge distance minimum, and with the plurality of fingertips. - View Dependent Claims (39)
-
-
40. A method of recognizing hand gestures, comprising:
-
determining a skin tone distribution for a plurality of pixels in a video signal; using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; receiving a color image associated with a frame of the video signal; conducting an edge analysis on the color image for each of a plurality of channels identifying an edge map associated with the edge analysis; iteratively propagating nearest neighbor information between pixels in the edge map to obtain a distance map; identifying a set of contour line pixels that surround a plurality of fingertips in the color image based on the edge map and the distance map, wherein the skin tone distribution is determined based on color values for pixels inside the set of contour line pixels.
-
-
41. A method of recognizing hand gestures, comprising:
-
determining a skin tone distribution for a plurality of pixels in a video signal; using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; removing non-skin pixels from an input frame associated with the video signal based on the skin tone distribution; sub-sampling the input frame to obtain a plurality of modified frames; identifying a plurality of blobs in the plurality of modified frames, further including; determining a Hessian trace function; performing, for each pixel in a modified frame, a convolution between the Hessian trace function and a set of non-adjacent pixels associated with the pixel in the modified frame to obtain a convolution score; invoking the convolution for a plurality of variance parameter values to obtain a plurality of convolution scores for the pixel in the modified frame; and identifying a blob corresponding to a highest score in the plurality of convolution scores. - View Dependent Claims (42, 43, 44, 45)
-
-
46. A method of recognizing hand gestures, comprising:
-
determining a skin tone distribution for a plurality of pixels in a video signal; using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; removing non-skin pixels from an input frame associated with the video signal based on the skin tone distribution; sub-sampling the input frame to obtain a plurality of modified frames; identifying a plurality of blobs in the plurality of modified frames; matching one or more poses associated with the plurality of blobs to one or more poses stored in a library, further including; grouping the plurality of blobs into a plurality of clusters; forming a density map based on the plurality of clusters; and using the density map to identify the one or more poses. - View Dependent Claims (47, 48, 49, 50)
-
-
51. A method of recognizing hand gestures, comprising:
-
determining a skin tone distribution for a plurality of pixels in a video signal; using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; removing non-skin pixels from an input frame associated with the video signal based on the skin tone distribution; sub-sampling the input frame to obtain a plurality of modified frames; identifying a plurality of blobs in the plurality of modified frames; matching one or more poses associated with the plurality of blobs to one or more poses stored in a library, further including; identifying a plurality of observation trajectories for the one or more poses; maintaining scores for the plurality of observation trajectories simultaneously; and using the scores to conduct the one or more blob-based hand gesture determinations. - View Dependent Claims (52)
-
-
53. At least one non-transitory computer readable storage medium comprising a set of instructions which, if executed by a computing device, cause the computing device to:
-
determine a skin tone distribution for a plurality of pixels in a video signal; use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal receive a color image associated with a frame of the video signal; conduct an edge analysis on the color image for each of a plurality of channels; determine, for each channel in the plurality of channels, a set of Gaussian derivatives; perform a convolution between the set of Gaussian derivatives and each pixel in the color image to obtain a gradient magnitude and a gradient angle for each pixel in the color image on a per channel basis; and use a low threshold and a high threshold to determine whether each gradient magnitude and associated gradient angle corresponds to an edge, wherein the low threshold and the high threshold are channel-specific. - View Dependent Claims (54, 55, 56, 57, 58, 59, 60, 61, 62)
-
-
63. At least one non-transitory computer readable storage medium comprising a set of instructions which, if executed by a computing device, cause the computing device to:
-
determine a skin tone distribution for a plurality of pixels in a video signal; use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; receive a color image associated with a frame of the video signal; conduct an edge analysis on the color image for each of a plurality of channels; identify an edge map associated with the edge analysis; and iteratively propagate nearest neighbor information between pixels in the edge map to obtain a distance map; initialize edge pixels in the edge map as being their own nearest edges and having an edge distance of zero; add the initialized edge pixels to a first queue; designate the first queue as an active queue; conduct, for each pixel in the active queue, a distance determination as to whether a first distance between a neighboring pixel and a nearest edge of the pixel in the active queue is less than or equal to a second distance between the neighboring pixel and a current nearest edge of the neighboring pixel; conduct a transfer of a state of the pixel in the active queue to a state of the neighboring pixel if the first distance is less than or equal to the second distance; replace the second distance in the state of the neighboring pixel with the first distance; conduct a removal of the pixel in the active queue from the active queue; conduct an addition of the neighboring pixel to the inactive queue if the first distance is less than or equal to the second distance; conduct a first repeat of the distance determination, the transfer of the state and the addition of the neighboring pixel for each neighboring pixel of the pixel in the active queue; conduct a first designation of the first queue as the inactive queue; conduct a second designation of the second queue as the active queue; and conduct a subsequent repeat of the first repeat, the first designation and the second designation until the active queue is empty.
-
-
64. At least one non-transitory computer readable storage medium comprising a set of instructions which, if executed by a computing device, cause the computing device to:
-
determine a skin tone distribution for a plurality of pixels in a video signal; use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; receive a color image associated with a frame of the video signal; conduct an edge analysis on the color image for each of a plurality of channels; identify an edge map associated with the edge analysis; identify a set of contour line pixels that surround a plurality of fingertips in the color image based on the edge map and the distance map; use a set of finger segment curves to identify a plurality of local edge distance minima corresponding to the plurality of fingertips, wherein the plurality of fingertips is to include one or more of an index fingertip, a middle fingertip, a ring fingertip, or a pinky fingertip; and use the set of finger segment curves to identify four global edge distance minima for contour line pixels associated with each local edge distance minimum and with each of the plurality of fingertips. - View Dependent Claims (65)
-
-
66. At least one non-transitory computer readable storage medium comprising a set of instructions which, if executed by a computing device, cause the computing device to:
-
determine a skin tone distribution for a plurality of pixels in a video signal; use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; receive a color image associated with a frame of the video signal; conduct an edge analysis on the color image for each of a plurality of channels; identify an edge map associated with the edge analysis; identify a set of contour line pixels that surround a plurality of fingertips in the color image based on the edge map and the distance map, wherein the skin tone distribution is to be determined based on color values for pixels inside the set of contour line pixels.
-
-
67. At least one non-transitory computer readable storage medium comprising a set of instructions which, if executed by a computing device, cause the computing device to:
-
determine a skin tone distribution for a plurality of pixels in a video signal; use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; remove non-skin pixels from an input frame associated with the video signal based on the skin tone distribution; sub-sample the input frame to obtain a plurality of modified frames; identify a plurality of blobs in the plurality of modified frames; determine a Hessian trace function; perform, for each pixel in a modified frame, a convolution between the Hessian trace function and a set of non-adjacent pixels associated with the pixel in the modified frame to obtain a convolution score; invoke the convolution for a plurality of variance parameters to obtain a plurality of convolution scores for the pixel in the modified frame; and identify a blob corresponding to a highest score in the plurality of convolution scores. - View Dependent Claims (69, 70, 71)
-
-
72. At least one non-transitory computer readable storage medium comprising a set of instructions which, if executed by a computing device, cause the computing device to:
-
determine a skin tone distribution for a plurality of pixels in a video signal; use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; remove non-skin pixels from an input frame associated with the video signal based on the skin tone distribution; sub-sample the input frame to obtain a plurality of modified frames; identify a plurality of blobs in the plurality of modified frames; match one or more poses associated with the plurality of blobs to one or more poses stored in a library; group the plurality of blobs into a plurality of clusters; form a density map based on the plurality of clusters; and use the density map to identify the one or more poses. - View Dependent Claims (73, 74, 75, 76)
-
-
77. At least one non-transitory computer readable storage medium comprising a set of instructions which, if executed by a computing device, cause the computing device to:
-
determine a skin tone distribution for a plurality of pixels in a video signal; use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal; remove non-skin pixels from an input frame associated with the video signal based on the skin tone distribution; sub-sample the input frame to obtain a plurality of modified frames; identify a plurality of blobs in the plurality of modified frames; match one or more poses associated with the plurality of blobs to one or more poses stored in a library; identify a plurality of observation trajectories for the one or more poses; maintain scores for the plurality of observation trajectories simultaneously; and use the scores to conduct the one or more blob-based hand gesture determinations. - View Dependent Claims (78)
-
Specification