PERFORMING HAND GESTURE RECOGNITION USING 2D IMAGE DATA
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods may provide for determining a skin tone distribution for a plurality of pixels in a video signal and using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal. In one example, the video signal includes two-dimensional (2D) image data, and the skin tone distribution has an execution time budget that is greater than an execution time budget of the blob-based hand gesture determinations.
16 Citations
168 Claims
-
1-84. -84. (canceled)
-
85. An apparatus to recognize hand gestures, comprising:
-
an offline module to determine a skin tone distribution for a plurality of pixels in a video signal; and an online module to use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal. - View Dependent Claims (86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112)
-
86. The apparatus of claim 85, wherein the video signal is to include two-dimensional (2D) image data.
-
87. The apparatus of claim 85, wherein the skin tone distribution is to have an execution time budget that is greater than an execution time budget of the blob-based hand gesture determinations.
-
88. The apparatus of claim 85, wherein the offline module includes an edge detection unit to receive a color image associated with a frame of the video signal and conduct an edge analysis on the color image for each of a plurality of channels.
-
89. The apparatus of claim 88, wherein the edge detection unit includes:
-
box logic to, for each channel in the plurality of channels, determine a set of Gaussian derivatives; convolution logic to perform a convolution between the set of Gaussian derivatives and each pixel in the color image to obtain a gradient magnitude and a gradient angle for each pixel in the color image on a per channel basis; and threshold logic to use a low threshold and a high threshold to determine whether each gradient magnitude and associated gradient angle corresponds to an edge, wherein the low threshold and the high threshold are channel-specific.
-
-
90. The apparatus of claim 89, wherein the threshold logic is to, for each channel in the plurality of channels, build a histogram of gradient magnitudes and determine the low threshold and the high threshold based on the histogram.
-
91. The apparatus of claim 89, wherein the edge detection unit further includes stack logic to identify one or more edge pixels and determine whether a neighborhood of pixels around the one or more edge pixels includes additional edge pixels, wherein the neighborhood of pixels is to include one or more pixels that are non-adjacent to the one or more edge pixels.
-
92. The apparatus of claim 89, wherein the box logic is to set a variance parameter of the set of Gaussian derivatives to a value greater than one.
-
93. The apparatus of claim 88, wherein the offline module further includes a distance unit to identify an edge map associated with the edge analysis and iteratively propagate nearest neighbor information between pixels in the edge map to obtain a distance map.
-
94. The apparatus of claim 93, wherein the distance module includes:
-
first initialization logic to initialize edge pixels in the edge map as being their own nearest edges and having an edge distance of zero, add the initialized edge pixels to a first queue, and designate the first queue as an active queue; second initialization logic to initialize non-edge pixels in the edge map as having unknown nearest edges and an edge distance of infinity and designate a second queue as an inactive queue; comparison logic to, for each pixel in the active queue, conduct a distance determination as to whether a first distance between a neighboring pixel and a nearest edge of the pixel in the active queue is less than or equal to a second distance between the neighboring pixel and a current nearest edge of the neighboring pixel; broadcast logic to conduct a transfer of a state of the pixel in the active queue to a state of the neighboring pixel if the first distance is less than or equal to the second distance, and replace the second distance in the state of the neighboring pixel with the first distance; queue logic to conduct a removal the pixel in the active queue from the active queue and an addition of the neighboring pixel to the inactive queue if the first distance is less than or equal to the second distance; first iteration logic to repeat a first invocation of the comparison logic, the broadcast logic and the queue logic for each neighboring pixel of the pixel in the active queue; and second iteration logic to conduct a first designation of the first queue as the inactive queue, a second designation of the second queue as the active queue, and repeat a subsequent invocation of the comparison logic, the broadcast logic, the queue logic and the first iteration logic until the active queue is empty.
-
-
95. The apparatus of claim 93, wherein the offline module further includes a fingertip unit to identify a set of contour line pixels that surround a plurality of fingertips in the color image based on the edge map and the distance map.
-
96. The apparatus of claim 95, wherein the fingertip module includes:
-
local logic to use a set of finger segment curves to identify a plurality of local edge distance minima corresponding to the plurality of fingertips, wherein the plurality of fingertips includes one or more of an index fingertip, a middle fingertip, a ring fingertip, or a pinky fingertip; and global logic to use the set of finger segment curves to identify four global edge distance minima for contour line pixels associated with each local edge distance minimum and with each of the plurality of fingertips.
-
-
97. The apparatus of claim 96, wherein the set of finger segment curves is to include a concatenation of two line segments and two ellipse segments.
-
98. The apparatus of claim 95, wherein the skin tone distribution is to be determined based on color values for pixels inside the set of contour line pixels.
-
99. The apparatus of claim 85, wherein the online module is to remove non-skin pixels from an input frame associated with the video signal based on the skin tone distribution and sub-sample the input frame to obtain a plurality of modified frames, and wherein the online module includes a feature extraction unit to identify a plurality of blobs in the plurality of modified frames.
-
100. The apparatus of claim 99, wherein the feature extraction unit includes:
-
trace logic to determine a Hessian trace function; convolution logic to, for each pixel in a modified frame, perform a convolution between the Hessian trace function and a set of non-adjacent pixels associated with the pixel in the modified frame to obtain a convolution score; scale logic to invoke the convolution logic for a plurality of variance parameter values to obtain a plurality of convolution scores for the pixel in the modified frame; and selection logic to identify a blob corresponding to a highest score in the plurality of convolution scores.
-
-
101. The apparatus of claim 100, wherein the convolution logic is to use a 9×
- 9 convolution box to perform the convolution.
-
102. The apparatus of claim 100, wherein the set of non-adjacent pixels are to have a spacing of a closest integer to two thirds the variance parameter of the Hessian trace function.
-
103. The apparatus of claim 100, wherein one or more variance parameter values in the plurality of variance parameter values is to be a one quarter increment of a preceding variance parameter value.
-
104. The apparatus of claim 100, wherein the convolution logic is to use one or more single instruction multiple data (SIMD) commands and a SIMD convolution method to perform the convolution.
-
105. The apparatus of claim 99, wherein the online module further includes a pose unit to match one or more poses associated with the plurality of blobs to one or more poses stored in a library.
-
106. The apparatus of claim 105, wherein the pose unit includes:
-
cluster logic to group the plurality of blobs into a plurality of clusters; descriptor logic to form a density map based on the plurality of clusters; and match logic to use the density map to identify the one or more poses.
-
-
107. The apparatus of claim 106, wherein the cluster logic is to weight the plurality of blobs according to blob size, and wherein the plurality of clusters are to be k-means clusters.
-
108. The apparatus of claim 106, wherein the cluster logic is to use an objective function to identify clusters that satisfy one or more of a compactness condition, a disjointedness condition, or a size threshold condition.
-
109. The apparatus of claim 106, wherein the descriptor logic is to normalize one or more of the blobs with respect to a cluster radius, scale-up one or more of the blobs based on a size of the density map and normalize the density map to obtain a byte grid.
-
110. The apparatus of claim 106, wherein the match logic is to conduct one or more distance calculation operations to identify the one or more poses.
-
111. The apparatus of claim 105, wherein the online module further includes a temporal recognition unit to identify a plurality of observation trajectories for the one or more poses, maintain scores for the plurality of observation trajectories simultaneously, and use the scores to conduct the one or more blob-based hand gesture determinations.
-
112. The apparatus of claim 111, wherein the temporal recognition unit includes:
-
specification logic to identify a set of valid transitions; compliance logic to identify a plurality of observation sequences in training data and remove one or more observation sequences that are non-compliant with the set of valid transitions; Hidden Markov Model (HMM) initialization logic to identify one or more clusters of values associated with compliant observation sequences, take a Cartesian product of the one or more clusters of values and use the Cartesian product to define a plurality of HMM states; and Viterbi logic to determine the scores for the plurality of observation trajectories based on the plurality of HMM states, wherein the blob-based hand gesture determinations are to distinguish between ongoing trajectories, killed trajectories and completed trajectories based on drops in the scores.
-
-
86. The apparatus of claim 85, wherein the video signal is to include two-dimensional (2D) image data.
-
-
113. A method of recognizing hand gestures, comprising:
-
determining a skin tone distribution for a plurality of pixels in a video signal; and using the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal. - View Dependent Claims (114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140)
-
114. The method of claim 113, wherein the video signal includes two-dimensional (2D) image data.
-
115. The method of claim 113, wherein the skin tone distribution has an execution time budget that is greater than an execution time budget of the blob-based hand gesture determinations.
-
116. The method of claim 113, further including:
-
receiving a color image associated with a frame of the video signal; and conducting an edge analysis on the color image for each of a plurality of channels.
-
-
117. The method of claim 116, further including:
-
determining, for each channel in the plurality of channels, a set of Gaussian derivatives; performing a convolution between the set of Gaussian derivatives and each pixel in the color image to obtain a gradient magnitude and a gradient angle for each pixel in the color image on a per channel basis; and using a low threshold and a high threshold to determine whether each gradient magnitude and associated gradient angle corresponds to an edge, wherein the low threshold and the high threshold are channel-specific.
-
-
118. The method of claim 117, further including:
-
building, for each channel in the plurality of channels, a histogram of gradient magnitudes; and determining the low threshold and the high threshold based on the histogram.
-
-
119. The method of claim 117, further including:
-
identifying one or more edge pixels; and determining whether a neighborhood of pixels around the one or more edge pixels includes additional edge pixels, wherein the neighborhood of pixels includes one or more pixels that are non-adjacent to the one or more edge pixels.
-
-
120. The method of claim 117, further including setting a variance parameter of the set of Gaussian derivatives to a value greater than one.
-
121. The method of claim 116, further including:
-
identifying an edge map associated with the edge analysis; and iteratively propagating nearest neighbor information between pixels in the edge map to obtain a distance map.
-
-
122. The method of claim 121, further including:
-
initializing edge pixels in the edge map as being their own nearest edges and having an edge distance of zero; adding the initialized edge pixels to a first queue; designating the first queue as an active queue; initializing non-edge pixels in the edge map as having unknown nearest edges and an edge distance of infinity; designating a second queue as an inactive queue; conducting, for each pixel in the active queue, a distance determination as to whether a first distance between a neighboring pixel and a nearest edge of the pixel in the active queue is less than or equal to a second distance between the neighboring pixel and a current nearest edge of the neighboring pixel; conducting a transfer of a state of the pixel in the active queue to a state of the neighboring pixel if the first distance is less than or equal to the second distance; replacing the second distance in the state of the neighboring pixel with the first distance; conducting a removal of the pixel in the active queue from the active queue; conducting an addition of the neighboring pixel to the inactive queue if the first distance is less than or equal the second distance; conducting a first repeat of the distance determination, the transfer of the state and the addition of the neighboring pixel for each neighboring pixel of the pixel in the active queue; conducting a first designation of the first queue as the inactive queue; conducting a second designation of the second queue as the active queue; and conducting a subsequent repeat of the first repeat, the first designation and the second designation until the active queue is empty.
-
-
123. The method of claim 121, further including identifying a set of contour line pixels that surround a plurality of fingertips in the color image based on the edge map and the distance map.
-
124. The method of claim 123, further including:
-
using a set of finger segment curves to identify a plurality of local edge distance minima corresponding to the plurality of fingertips, wherein the plurality of fingertips includes one or more of an index fingertip, a middle fingertip, a ring fingertip, or a pinky fingertip; and using the set of finger segment curves to identify four global edge distance minima for contour line pixels associated with each local edge distance minimum, and with the plurality of fingertips.
-
-
125. The method of claim 124, wherein the set of finger segment curves includes a concatenation of two line segments and two ellipse segments.
-
126. The method of claim 123, wherein the skin tone distribution is determined based on color values for pixels inside the set of contour line pixels.
-
127. The method of claim 113, further including:
-
removing non-skin pixels from an input frame associated with the video signal based on the skin tone distribution; sub-sampling the input frame to obtain a plurality of modified frames; and identifying a plurality of blobs in the plurality of modified frames.
-
-
128. The method of claim 127, further including:
-
determining a Hessian trace function; performing, for each pixel in a modified frame, a convolution between the Hessian trace function and a set of non-adjacent pixels associated with the pixel in the modified frame to obtain a convolution score; invoking the convolution for a plurality of variance parameter values to obtain a plurality of convolution scores for the pixel in the modified frame; and identifying a blob corresponding to a highest score in the plurality of convolution scores.
-
-
129. The method of claim 128, further including using a 9×
- 9 convolution box to perform the convolution.
-
130. The method of claim 128, wherein the set of non-adjacent pixels have a spacing of a closest integer to two thirds the variance parameter of the Hessian trace function.
-
131. The method of claim 128, wherein one or more variance parameter values in the plurality of variance parameter values is a one quarter increment of a preceding variance parameter value.
-
132. The method of claim 128, further including using one or more single instruction multiple data (SIMD) commands and a SIMD convolution method to perform the convolution.
-
133. The method of claim 127, further including matching one or more poses associated with the plurality of blobs to one or more poses stored in a library.
-
134. The method of claim 133, further including:
-
grouping the plurality of blobs into a plurality of clusters; forming a density map based on the plurality of clusters; and using the density map to identify the one or more poses.
-
-
135. The method of claim 134, further including weighting the plurality of blobs according to blob size, wherein the plurality of clusters are k-means clusters.
-
136. The method of claim 134, further including using an objective function to identify clusters that satisfy one or more of a compactness condition, a disjointedness condition, or a size threshold condition.
-
137. The method of claim 134, further including:
-
normalizing one or more of the blobs with respect to a cluster radius; scaling up one or more of the blobs based on a size of the density map; and normalizing the density map to obtain a byte grid.
-
-
138. The method of claim 134, further including conducting one or more distance calculation operations to identify the one or more poses.
-
139. The method of claim 133, further including:
-
identifying a plurality of observation trajectories for the one or more poses; maintaining scores for the plurality of observation trajectories simultaneously; and using the scores to conduct the one or more blob-based hand gesture determinations.
-
-
140. The method of claim 139, further including:
-
identifying a set of valid transitions; identifying a plurality of observation sequences in training data; removing one or more observation sequences that are non-compliant with the set of valid transitions; identifying one or more clusters of values associated with compliant observation sequences; taking a Cartesian product of the one or more clusters of values; using the Cartesian product to define a plurality of Hidden Markov Model (HMM) states; and determining the scores for the plurality of observation trajectories based on the plurality of HMM states, wherein the blob-based hand gesture determinations distinguish between ongoing trajectories, killed trajectories and completed trajectories based on drops in the scores.
-
-
114. The method of claim 113, wherein the video signal includes two-dimensional (2D) image data.
-
-
141. At least one computer readable storage medium comprising a set of instructions which, if executed by a computing device, cause the computing device to:
-
determine a skin tone distribution for a plurality of pixels in a video signal; and use the skin tone distribution to conduct one or more blob-based hand gesture determinations with respect to the video signal. - View Dependent Claims (142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168)
-
142. The at least one computer readable storage medium of claim 141, wherein the video signal is to include two-dimensional (2D) image data.
-
143. The at least one computer readable storage medium of claim 141, wherein the skin tone distribution is to have an execution time budge that is greater than an execution time budge of the blob-based hand gesture determinations.
-
144. The at least one computer readable storage medium of claim 141, wherein the instructions, if executed, cause a computing device to:
-
receive a color image associated with a frame of the video signal; and conduct an edge analysis on the color image for each of a plurality of channels.
-
-
145. The at least one computer readable storage medium of claim 144, wherein the instructions, if executed, cause a computing device to:
-
determine, for each channel in the plurality of channels, a set of Gaussian derivatives; perform a convolution between the set of Gaussian derivatives and each pixel in the color image to obtain a gradient magnitude and a gradient angle for each pixel in the color image on a per channel basis; and use a low threshold and a high threshold to determine whether each gradient magnitude and associated gradient angle corresponds to an edge, wherein the low threshold and the high threshold are channel-specific.
-
-
146. The at least one computer readable storage medium of claim 145, wherein the instructions, if executed, cause a computing device to:
-
build, for each channel in the plurality of channels, a histogram of gradient magnitudes; and determine the low threshold and the high threshold based on the histogram.
-
-
147. The at least one computer readable storage medium of claim 145, wherein the instructions, if executed, cause a computing device to:
-
identify one or more edge pixels; and determine whether a neighborhood of pixels around the one or more edge pixels includes additional edge pixels, wherein the neighborhood of pixels is to include one or more pixels that are non-adjacent to the one or more edge pixels.
-
-
148. The at least one computer readable storage medium of claim 145, wherein the instructions, if executed, cause a computing device to set a variance parameter of the set of Gaussian derivatives to a value greater than one.
-
149. The at least one computer readable storage medium of claim 144, wherein the instructions, if executed, cause a computing device to:
-
identify an edge map associated with the edge analysis; and iteratively propagate nearest neighbor information between pixels in the edge map to obtain a distance map.
-
-
150. The at least one computer readable storage medium of claim 149, wherein the instructions, if executed, cause a computing device to:
-
initialize edge pixels in the edge map as being their own nearest edges and having an edge distance of zero; add the initialized edge pixels to a first queue; designate the first queue as an active queue; conduct, for each pixel in the active queue, a distance determination as to whether a first distance between a neighboring pixel and a nearest edge of the pixel in the active queue is less than or equal to a second distance between the neighboring pixel and a current nearest edge of the neighboring pixel; conduct a transfer of a state of the pixel in the active queue to a state of the neighboring pixel if the first distance is less than or equal to the second distance; replace the second distance in the state of the neighboring pixel with the first distance; conduct a removal of the pixel in the active queue from the active queue; conduct an addition of the neighboring pixel to the inactive queue if the first distance is less than or equal to the second distance; conduct a first repeat of the distance determination, the transfer of the state and the addition of the neighboring pixel for each neighboring pixel of the pixel in the active queue; conduct a first designation of the first queue as the inactive queue; conduct a second designation of the second queue as the active queue; and conduct a subsequent repeat of the first repeat, the first designation and the second designation until the active queue is empty.
-
-
151. The at least one computer readable storage medium of claim 149, wherein the instructions, if executed, cause a computing device to identify a set of contour line pixels that surround a plurality of fingertips in the color image based on the edge map and the distance map.
-
152. The at least one computer readable storage medium of claim 151, wherein the instructions, if executed, cause a computing device to:
-
use a set of finger segment curves to identify a plurality of local edge distance minima corresponding to the plurality of fingertips, wherein the plurality of fingertips is to include one or more of an index fingertip, a middle fingertip, a ring fingertip, or a pinky fingertip; and use the set of finger segment curves to identify four global edge distance minima for contour line pixels associated with each local edge distance minimum and with each of the plurality of fingertips.
-
-
153. The at least one computer readable storage medium of claim 152, wherein the set of finger segment curves is to include a concatenation of two line segments and two ellipse segments.
-
154. The at least one computer readable storage medium of claim 151, wherein the skin tone distribution is to be determined based on color values for pixels inside the set of contour line pixels.
-
155. The at least one computer readable storage medium of claim 141, wherein the instructions, if executed, cause a computing device to:
-
remove non-skin pixels from an input frame associated with the video signal based on the skin tone distribution; sub-sample the input frame to obtain a plurality of modified frames; and identify a plurality of blobs in the plurality of modified frames.
-
-
156. The at least one computer readable storage medium of claim 155, wherein the instructions, if executed, cause a computer to:
-
determine a Hessian trace function; perform, for each pixel in a modified frame, a convolution between the Hessian trace function and a set of non-adjacent pixels associated with the pixel in the modified frame to obtain a convolution score; invoke the convolution for a plurality of variance parameters to obtain a plurality of convolution scores for the pixel in the modified frame; and identify a blob corresponding to a highest score in the plurality of convolution scores.
-
-
157. The at least one computer readable storage medium of claim 156, wherein the instructions, if executed, cause a computing device to use a 9×
- 9 convolution box to perform the convolution.
-
158. The at least one computer readable storage medium of claim 156, wherein the set of non-adjacent pixels have a spacing of a closest integer to two thirds the variance parameter of the Hessian trace function.
-
159. The at least one computer readable storage medium of claim 156, wherein one or more variance parameter values in the plurality of variance parameter values is a one quarter increment of a preceding variance parameter value.
-
160. The at least one computer readable storage medium of claim 156, wherein the instructions, if executed, cause a computing device to use one or more single instruction multiple data (SIMD) commands and a SIMD convolution method to perform the convolution.
-
161. The at least one computer readable storage medium of claim 155, wherein the instructions, if executed, cause a computing device to match one or more poses associated with the plurality of blobs to one or more poses stored in a library.
-
162. The at least one computer readable storage medium of claim 161 wherein the instructions, if executed, cause a computing device to:
-
group the plurality of blobs into a plurality of clusters; form a density map based on the plurality of clusters; and use the density map to identify the one or more poses.
-
-
163. The at least one computer readable storage medium of claim 162, wherein the instructions, if executed, cause a computing device to weight the plurality of blobs according to blob size, wherein the plurality of clusters are to be k-means clusters.
-
164. The at least one computer readable storage medium of claim 162, wherein the instructions, if executed, cause a computing device to use an objective function to identify clusters that satisfy one or more of a compactness condition, a disjointedness condition, or a size threshold condition.
-
165. The at least one computer readable storage medium of claim 162, wherein the instructions, if executed, cause a computing device to:
-
normalize one or more of the blobs with respect to a cluster radius; scale-up one or more of the blobs based on a size of the density map; and normalize the density map to obtain a byte grid.
-
-
166. The at least one computer readable storage medium of claim 162, wherein the instructions, if executed, cause a computing device to conduct one or more distance calculation operations to identify the one or more poses.
-
167. The at least one computer readable storage medium of claim 161, wherein the instructions, if executed, cause a computing device to:
-
identify a plurality of observation trajectories for the one or more poses; maintain scores for the plurality of observation trajectories simultaneously; and use the scores to conduct the one or more blob-based hand gesture determinations.
-
-
168. The at least one computer readable storage medium of claim 167, wherein the instructions, if executed, cause a computing device to:
-
identify a set of valid transitions; identify a plurality of observation sequences in training data; remove one or more observation sequences that are non-compliant with the set of valid transitions; identify one or more clusters of values associated with compliant observation sequences; take a Cartesian product of the one or more clusters of values use the Cartesian product to define a plurality of Hidden Markov Model (HMM) states; and determine the scores for the plurality of observation trajectories based on the plurality of HMM states, wherein the blob-based hand gesture determinations are to distinguish between ongoing trajectories, killed trajectories and completed trajectories based on drops in the scores.
-
-
142. The at least one computer readable storage medium of claim 141, wherein the video signal is to include two-dimensional (2D) image data.
-
Specification
- Resources
Thank you for your request. You will receive a custom alert email when the Litigation Campaign Assessment is available.
×
-
Current AssigneeIntel Corporation
-
Original AssigneeIntel Corporation
-
InventorsKounavis, Michael, Yavatkar, Rajendra, Schoinas, Ioannis, Abad Vazquez, Carlos
-
Granted Patent
-
Time in Patent OfficeDays
-
Field of Search
-
US Class Current1/1
-
CPC Class CodesG06F 18/2321 using statistics or functio...G06F 18/23213 with fixed number of cluste...G06F 3/017 Gesture based interaction, ...G06F 3/0304 Detection arrangements usin...G06T 7/246 using feature-based methods...G06V 10/255 Detecting or recognising po...G06V 10/267 by performing operations on...G06V 10/34 Smoothing or thinning of th...G06V 10/449 Biologically inspired filte...G06V 10/96 Management of image or vide...G06V 2201/07 Target detectionG06V 40/113 Recognition of static hand ...G06V 40/28 Recognition of hand or arm ...