Motion-assisted visual language for human computer interfaces
First Claim
1. A computer-implemented method for recognizing a visual gesture, the method comprising:
- receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames;
determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors;
selecting a visual gesture recognition process based on a user selection of a visual gesture recognition process from a plurality of visual gesture recognition processes;
applying the selected visual gesture recognition process to the plurality of video frames to recognize the visual gesture;
determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously;
object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC);
object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and
object detection comprising;
feature extraction employing edge analysis, spatial transforms, and background subtraction,feature analysis employing clustering and vector quantization, andfeature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and
deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the analysis of the variations in the centroid, shape, and size of the object within the ROI.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention recognize human visual gestures, as captured by image and video sensors, to develop a visual language for a variety of human computer interfaces. One embodiment of the invention provides a computer-implement method for recognizing a visual gesture portrayed by a part of human body such as a human hand, face or body. The method includes steps of receiving the visual signature captured in a video having multiple video frames, determining a gesture recognition type from multiple gesture recognition types including shaped-based gesture, position-based gesture, motion-assisted and mixed gesture that combining two different gesture types. The method further includes steps of selecting a visual gesture recognition process based on the determined gesture type and applying the selected visual gesture recognition process to the multiple video frames capturing the visual gesture to recognize the visual gesture.
150 Citations
104 Claims
-
1. A computer-implemented method for recognizing a visual gesture, the method comprising:
-
receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames; determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors; selecting a visual gesture recognition process based on a user selection of a visual gesture recognition process from a plurality of visual gesture recognition processes; applying the selected visual gesture recognition process to the plurality of video frames to recognize the visual gesture; determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously; object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC); object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and object detection comprising; feature extraction employing edge analysis, spatial transforms, and background subtraction, feature analysis employing clustering and vector quantization, and feature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the analysis of the variations in the centroid, shape, and size of the object within the ROI. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A computer-implemented method for recognizing a visual gesture, the method comprising:
-
receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames; determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors; selecting a visual gesture recognition process from a plurality of visual gesture recognition processes based on type of the visual gesture formed by a part of a human body; applying the selected visual gesture recognition process to the plurality of video frames to recognize the visual gesture; determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously; object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC); object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and object detection comprising; feature extraction employing edge analysis, spatial transforms, and background subtraction, feature analysis employing clustering and vector quantization, and feature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the determined variations in the centroid, shape, and size of the object within the ROI. - View Dependent Claims (46, 47, 48, 49, 50)
-
-
51. A computer-implemented method for recognizing a visual gesture, the method comprising:
-
receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames; determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors; applying different visual gesture recognition processes to the plurality of video frames in parallel; merging results of the different visual gesture recognition processes to recognize the visual gesture; determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously; object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC); object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and object detection comprising; feature extraction employing edge analysis, spatial transforms, and background subtraction, feature analysis employing clustering and vector quantization, and feature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the merged results of the different visual gesture recognition processes. - View Dependent Claims (52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77)
-
-
78. A non-transitory computer-readable storage medium storing executable computer program instructions for recognizing a visual gesture, the computer program instructions comprising code for:
-
receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames; determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors; applying different visual gesture recognition processes to the plurality of video frames in parallel; merging results of the different visual gesture recognition processes to recognize the visual gesture; determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously; object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC); object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and object detection comprising; feature extraction employing edge analysis, spatial transforms, and background subtraction, feature analysis employing clustering and vector quantization, and feature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the merged results of the different visual gesture recognition processes. - View Dependent Claims (79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104)
-
Specification