Motion-assisted visual language for human computer interfaces

US 9,829,984 B2
Filed: 11/20/2013
Issued: 11/28/2017
Est. Priority Date: 05/23/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for recognizing a visual gesture, the method comprising:

receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames;

determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors;

selecting a visual gesture recognition process based on a user selection of a visual gesture recognition process from a plurality of visual gesture recognition processes;

applying the selected visual gesture recognition process to the plurality of video frames to recognize the visual gesture;

determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously;

object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC);

object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and

object detection comprising;

feature extraction employing edge analysis, spatial transforms, and background subtraction,feature analysis employing clustering and vector quantization, andfeature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and

deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the analysis of the variations in the centroid, shape, and size of the object within the ROI.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the invention recognize human visual gestures, as captured by image and video sensors, to develop a visual language for a variety of human computer interfaces. One embodiment of the invention provides a computer-implement method for recognizing a visual gesture portrayed by a part of human body such as a human hand, face or body. The method includes steps of receiving the visual signature captured in a video having multiple video frames, determining a gesture recognition type from multiple gesture recognition types including shaped-based gesture, position-based gesture, motion-assisted and mixed gesture that combining two different gesture types. The method further includes steps of selecting a visual gesture recognition process based on the determined gesture type and applying the selected visual gesture recognition process to the multiple video frames capturing the visual gesture to recognize the visual gesture.

150 Citations

104 Claims

1. A computer-implemented method for recognizing a visual gesture, the method comprising:
- receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames;
  
  determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors;
  
  selecting a visual gesture recognition process based on a user selection of a visual gesture recognition process from a plurality of visual gesture recognition processes;
  
  applying the selected visual gesture recognition process to the plurality of video frames to recognize the visual gesture;
  
  determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously;
  
  object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC);
  
  object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and
  
  object detection comprising;
  
  feature extraction employing edge analysis, spatial transforms, and background subtraction,feature analysis employing clustering and vector quantization, andfeature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and
  
  deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the analysis of the variations in the centroid, shape, and size of the object within the ROI.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
- - 2. The method of claim 1 wherein the plurality of visual gesture recognition processes include a shaped-based gesture recognition process, a position-based gesture recognition process, a motion-assisted gesture recognition process and a mixed gesture recognition process.
  - 3. The method of claim 1, wherein applying the selected visual gesture recognition process comprises:
    - applying a shape-based gesture recognition process to the plurality of the video frames of the visual gesture.
  - 4. The method of claim 3, wherein applying the shape-based gesture recognition process to the plurality of the video frames of the visual gesture comprises:
    - selecting a video frame of the plurality of the video frames of the video of the visual gesture as a reference frame, the video frame representing a digital color image of the visual gesture at a time instance;
      
      applying a general parametric model to the selected video frame of the visual gesture to generate a specific parametric template of the visual gesture;
      
      subsequently receiving one or more video frames of the video of the visual gesture, wherein the visual gesture in a subsequently received video frame is the visual gesture at a subsequent time instance;
      
      for each subsequently received video frame, detecting a visual contour of the visual gesture based at least in part on the specific parametric template of the visual gesture; and
      
      recognizing the visual gesture based at least in part on the detected visual contours.
  - 5. The method of claim 1, wherein applying the selected visual gesture recognition process comprises:
    - applying a motion-assisted gesture recognition process to the plurality of the video frames of the visual gesture.
  - 6. The method of claim 5, wherein applying the motion-assisted gesture recognition process to the plurality of the video frames of the visual gesture comprises:
    - detecting motion of the object contained in the plurality of video frames capturing the visual gesture, the object being bounded in the ROI of a video frame of the plurality of video frames of the video; and
      
      recognizing the visual gesture based at least in part on the detected motion of the object.
  - 7. The method of claim 6, further comprising:
    - tracking the motion of the object across a subset of the plurality of video frames;
      
      obtaining estimated motion parameters of the object based on the tracking;
      
      refining the estimated motion parameters of the object; and
      
      recognizing the visual gesture based at least in part on the refined estimation of motion parameters of the object.
  - 8. The method of claim 1, wherein applying the visual gesture recognition process comprises:
    - applying a position-based gesture recognition process to the plurality of the video frames of the visual gesture, wherein the position-based gesture recognition process uses one of motion-assisted object tracking process, a secondary object tracking process and a mixed process applying the motion-assisted object tracking process and the secondary object tracking process in parallel.
  - 9. The method of claim 8, wherein applying the position-based gesture recognition process to the plurality of the video frames of the visual gesture comprises:
    - detecting changes of position of the object contained in the plurality of video frames capturing the visual gesture, a change of position of the object representing the visual gesture moving from a previous position to a current position; and
      
      recognizing the visual gesture based on the changes of position of the object.
  - 10. The method of claim 1, further comprising:
    - providing the visual gesture to a computer system having a video sensor; and
      
      activating one or more computer commands associated with the computer system using the visual gesture.
  - 11. The method of claim 10, wherein the visual gesture is a detected discrete hand motion, and the detected hand motion is discretized into a small number of directions including Left, Right, Up, and Down, and wherein the detected discrete hand motion is used to indicate a command in a computer application.
  - 12. The method of claim 11, wherein the detected discrete motion is used to indicate a command in navigating an application by moving one step, including moving a step forward or back in a presentation, document, webpage, list, or media gallery.
  - 13. The method of claim 11, wherein a combination of two detected discrete motions in quick succession constitutes another gesture, a Wave hand gesture.
  - 14. The method of claim 11, wherein a special gesture is used to launch or close a computer application.
  - 15. The method of claim 14, wherein the special gesture is comprised of a Wave hand gesture.
  - 16. The method of claim 14, further comprising:
    - detecting and tracking continuously a subsequent motion or position of the hand responsive to the computer application being launched via a hand gesture, wherein the detected hand motion or position is converted to corresponding motion or position of an on-screen object.
  - 17. The method of claim 16, wherein holding the hand still within a specified tolerance for a specified number of frames is used to indicate readiness to make a selection.
  - 18. The method of claim 10, wherein a special visual gesture is used to indicate selection or deselection in a computer application, including selecting a button, window, object, or link.
  - 19. The method of claim 18, wherein the visual gesture for selection or deselection is comprised of closing an open hand and making a first or opening a fist.
  - 20. The method of claim 18, wherein the visual gesture for selection or deselection is comprised of pointing a finger at a screen or retracting a finger.
  - 21. The method of claim 18, wherein the visual gesture for selection or deselection is comprised of a quick Tap of a hand towards screen or retracting the hand.
  - 22. The method of claim 10, wherein a visual gesture is used in an interactive application.
  - 23. The method of claim 10, wherein a visual gesture corresponds to an action of an on-screen object in a computer application.
  - 24. The method of claim 10, wherein detected hand shapes or figures are interpreted as numbers.
  - 25. The method of claim 10, wherein detected hand shapes or figures are interpreted as letters, words, or phrases, as in a sign language.
  - 26. The method of claim 10, wherein a visual gesture is used to indicate a zoom function in a computer application.
  - 27. The method of claim 26, wherein the visual gesture for zoom indication is comprised of moving a hand smoothly closer to the sensor indicating “
    - zoom in” and
      
      moving the hand further away from the sensor indicating “
      
      zoom out”
      
      .
  - 28. The method of claim 10, wherein a special gesture is comprised of a specified shape of a hand including:
    - a fist;
      
      thumbs up;
      
      thumbs down;
      
      a “
      
      V”
      
      sign with two fingers up, other fingers closed;
      
      an “
      
      O”
      
      sign with the hand shaped like a circle; and
      
      a “
      
      C”
      
      or “
      
      U”
      
      sign with the hand partly open, thumb parallel to remaining fingers, and pointing up, down, left or right.
  - 29. The method of claim 10, wherein the visual gesture is applied to or in a computer application residing in a computer system of one of following types:
    - a desktop, laptop, personal computer, printer, copier, or other peripheral;
      
      a tablet, smartphone, wearable computer, or other mobile device;
      
      an embedded computer system, as in a car, train, plane, television or home appliance; and
      
      an enterprise computing system.
  - 30. The method of claim 29, wherein the computer application includes at least one of the following types:
    - a hardware device application including a keyboard, mouse, camera or other peripheral;
      
      a software application including a presentation, spreadsheet, word processor, browser, media player or media editor;
      
      an art or design application;
      
      a communications application;
      
      an entertainment application or game;
      
      a virtual reality application; and
      
      a health monitoring application.
  - 31. The method of claim 10, wherein the visual gesture is comprised of a current position of a hand relative to an initial or reference position of the hand.
  - 32. The method of claim 10, wherein the visual gesture is comprised of a motion of a hand.
  - 33. The method of claim 10, wherein the visual gesture is comprised of a shape of a hand.
  - 34. The method of claim 10, wherein the visual gesture is comprised of a composite gesture comprising more than one gesture of type position, motion, or shape.
  - 35. The method of claim 10, wherein the visual gesture is composed with two hands.
  - 36. The method of claim 10, wherein the visual gesture is composed using a face, or a part of the face, the part of the face including an eye or both eyes.
  - 37. The method of claim 10, wherein the visual gesture is composed with arms.
  - 38. The method of claim 10, wherein the visual gesture is composed with the human body.
  - 39. The method of claim 1, wherein the visual gesture is a visual code representing one of a plurality of visual gestures and corresponding user input commands.
  - 40. The method of claim 1, further comprising generating a visual code based on one or more user input commands and a plurality of visual gestures.
  - 41. The method of claim 40, wherein the visual code is generated based at least in part on a plurality predefined visual codes, a plurality of user prerecorded visual codes or a plurality of visual codes created in real time by a user.
  - 42. The method of claim 41, wherein creating the visual codes in real time by the user comprises:
    - receiving one or more user input commands and a plurality of human visual gestures;
      
      encoding the plurality of human visual gestures into one or more visual codes based on the user input commands; and
      
      storing the encoded visual codes in a computer storage medium.
  - 43. The method of claim 42, wherein the user input commands are received using at least one of following:
    - tactile input by the user;
      
      live feed from a camera;
      
      voice commands;
      
      hand configurations of the user;
      
      eye movements of the user;
      
      handwritings of the user;
      
      finger or hand movements of the user;
      
      facial movements of the user; and
      
      body movements of the user.
  - 44. The method of claim 42, wherein the plurality of the human visual gestures comprise at least one of:
    - a plurality of hand gestures;
      
      a plurality of facial gestures; and
      
      a plurality of body gestures.

45. A computer-implemented method for recognizing a visual gesture, the method comprising:
- receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames;
  
  determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors;
  
  selecting a visual gesture recognition process from a plurality of visual gesture recognition processes based on type of the visual gesture formed by a part of a human body;
  
  applying the selected visual gesture recognition process to the plurality of video frames to recognize the visual gesture;
  
  determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously;
  
  object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC);
  
  object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and
  
  object detection comprising;
  
  feature extraction employing edge analysis, spatial transforms, and background subtraction,feature analysis employing clustering and vector quantization, andfeature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and
  
  deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the determined variations in the centroid, shape, and size of the object within the ROI.
- View Dependent Claims (46, 47, 48, 49, 50)
- - 46. The method of claim 45, wherein the plurality of visual gesture recognition processes comprises a shaped-based gesture recognition process, a position-based gesture recognition process, a motion-assisted gesture recognition process and a mixed gesture recognition process.
  - 47. The method of claim 45, further comprising:
    - selecting a shape-based visual gesture recognition process; and
      
      applying the selected shape-based visual gesture recognition process to the plurality of the video frames of the visual gesture.
  - 48. The method of claim 45, further comprising:
    - selecting a motion-assisted visual gesture recognition process; and
      
      applying the selected motion-assisted visual gesture recognition process to the plurality of the video frames of the visual gesture.
  - 49. The method of claim 45, further comprising:
    - selecting a position-based visual gesture recognition process; and
      
      applying the selected position-based visual gesture recognition process to the plurality of the video frames of the visual gesture.
  - 50. The method of claim 45, further comprising:
    - selecting a mixed visual gesture recognition process, the mixed visual gesture recognition process comprising two or more gesture recognition processes; and
      
      applying the selected mixed visual gesture recognition process to the plurality of the video frames of the visual gesture.

51. A computer-implemented method for recognizing a visual gesture, the method comprising:
- receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames;
  
  determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors;
  
  applying different visual gesture recognition processes to the plurality of video frames in parallel;
  
  merging results of the different visual gesture recognition processes to recognize the visual gesture;
  
  determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously;
  
  object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC);
  
  object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and
  
  object detection comprising;
  
  feature extraction employing edge analysis, spatial transforms, and background subtraction,feature analysis employing clustering and vector quantization, andfeature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and
  
  deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the merged results of the different visual gesture recognition processes.
- View Dependent Claims (52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77)
- - 52. The method of claim 51, further comprising:
    - presenting a plurality of visual gesture recognition types on a computer display.
  - 53. The method of claim 52, wherein the plurality of visual gesture types include a shaped-based gesture, a position-based gesture, a motion-assisted gesture and a mixed gesture.
  - 54. The method of claim 51, wherein a part of the human body forming the visual gesture is at least one of human hand, face and body.
  - 55. The method of claim 51, wherein applying the different visual gesture recognition processes to the plurality of video frames comprises:
    - applying a shape-based gesture recognition process to the plurality of the video frames of the visual gesture.
  - 56. The method of claim 55, wherein applying the shape-based gesture recognition process to the plurality of the video frames of the visual gesture comprises:
    - selecting a video frame of the plurality of the video frames of the video of the visual gesture as a reference frame, the video frame representing a digital color image of the visual gesture at a time instance;
      
      applying a general parametric model to the selected video frame of the visual gesture to generate a specific parametric template of the visual gesture;
      
      subsequently receiving one or more video frames of the video of the visual gesture, wherein the visual gesture in a subsequently received video frame is the visual gesture at a subsequent time instance;
      
      for each subsequently received video frame, detecting a visual contour of the visual gesture based at least in part on the specific parametric template of the visual gesture; and
      
      recognizing the visual gesture based at least in part on the detected visual contours.
  - 57. The method of claim 51, wherein applying the different visual gesture recognition processes to the plurality of video frames comprises:
    - applying a motion-assisted gesture recognition process to the plurality of the video frames of the visual gesture.
  - 58. The method of claim 57, wherein applying the motion-assisted gesture recognition process to the plurality of the video frames of the visual gesture comprises:
    - detecting motion of the object contained in the plurality of video frames capturing the visual gesture, the object being bounded in the ROI of a video frame of the plurality of video frames of the video; and
      
      recognizing the visual gesture based at least in part on the detected motion of the object.
  - 59. The method of claim 58, further comprising:
    - tracking the motion of the object across a subset of the plurality of video frames;
      
      obtaining estimated motion parameters of the object based on the tracking;
      
      refining the estimated motion parameters of the object; and
      
      recognizing the visual gesture based at least in part on the detected motion of the object.
  - 60. The method of claim 59, wherein determining motion of the object comprises:
    - identifying a first instance of the object contained in the ROI in the video frame; and
      
      obtaining a center point position of the ROI and a size of the ROI within the video frames.
  - 61. The method of claim 60, wherein obtaining estimated motion parameters of the object based on the tracking comprises:
    - clustering the motion vectors representing the tracked motion of the object; and
      
      obtaining one or more estimates of the center point position of the ROI and the size of the ROI based on the clustering of the motion vectors.
  - 62. The method of claim 59, wherein tracking the motion of the object comprises:
    - obtaining the motion vectors of the object, a motion vector of the object indicating a position of the object in the video frame relative to its corresponding position in a previous or reference video frame of the video.
  - 63. The method of claim 59, wherein refining the estimated motion parameters of the object comprises:
    - selecting a plurality of candidate ROIs based on a local search centered around estimated position of the ROI;
      
      applying a similarity measure to the plurality of candidate ROIs of the object; and
      
      selecting at least one ROI based on the similarity measure.
  - 64. The method of claim 63, wherein the similarity measure is chosen from a group of sum of absolute difference, mean squared error and normalized correlation coefficient, and the selected ROI optimizes the similarity measure among the plurality of candidate ROIs.
  - 65. The method of claim 63, further comprising adding another video frame as a second reference frame containing an estimated ROI, wherein the estimated ROI has a similarity score exceeding a threshold value.
  - 66. The method of claim 51, wherein applying the different visual gesture recognition processes to the plurality of video frames comprises:
    - applying a position-based gesture recognition process to the plurality of the video frames of the visual gesture.
  - 67. The method of claim 66, wherein applying the position-based gesture recognition process to the plurality of the video frames of the visual gesture comprises:
    - detecting changes of position of the object contained in the plurality of video frames capturing the visual gesture, a change of position of the object representing the visual gesture moving from a previous position to a current position; and
      
      recognizing the visual gesture based on the changes of position of the object.
  - 68. The method of claim 67, further comprising:
    - recording the changes of position of the object;
      
      quantizing the changes of the position of the object; and
      
      recognizing the visual gesture based on quantized changes of position of the object.
  - 69. The method of claim 68, wherein recording the changes of position of the object comprises:
    - recording a change of position of object by instantaneous position change of the object, wherein the instantaneous position change of the object indicates the change over an immediately previous position of the object.
  - 70. The method of claim 68, wherein recording the changes of position of the object comprises:
    - recording a change of position of object by reference position change of the object, wherein the reference position change of the object indicates the change over a pre-defined reference position of the object.
  - 71. The method of claim 68, wherein recording the changes of position of the object further comprises:
    - determining at least one characteristic of the visual gesture;
      
      selecting a method to record the changes of position of the object based on the determined characteristic of the visual gesture; and
      
      recording the changes of position of the object using the selected method.
  - 72. The method of claim 51, wherein applying the different visual gesture recognition processes comprises:
    - applying a motion-assisted object tracking process and a secondary object tracking process in parallel to the plurality of video frames of the visual gesture.
  - 73. The method of claim 72, wherein tracking the object contained in the plurality of video frames capturing the visual gesture by the motion-assisted object tracking process and the secondary object tracking process in parallel comprises:
    - obtaining a first set of estimates of a center point position of the ROI containing the object and a size of the ROI based on the object tracking by the motion-assisted object tracking process; and
      
      obtaining a second set of estimates of the center point position of the ROI containing the object and the size of the ROI based on the object tracking by the secondary object tracking process.
  - 74. The method of claim 73, wherein merging the results of the different visual gesture recognition processes comprises:
    - calculating a first confidence score for the motion-assisted object tracking process;
      
      calculating a second confidence score for the secondary object tracking process; and
      
      selecting the set of estimates based on comparison of the first confidence score and the second confidence score.
  - 75. The method of claim 73, further comprising:
    - allocating computing resource to one tracking process selected from the motion-assisted object tracking process and the secondary object tracking process with a priority higher than the other tracking process; and
      
      responsive to total computer resource allocated to both the motion-assisted object tracking process and the secondary object tracking process exceeding a predetermined threshold, disabling one of the object tracking processes temporally.
  - 76. The method of claim 73, further comprising:
    - responsive to total computer resource exceeding a predetermined threshold, scaling complexity of at least one of the object tracking processes temporally.
  - 77. The method of claim 51, wherein applying the different visual gesture recognition processes to the plurality of video frames comprises:
    - analyzing a plurality of features extracted from the plurality of video frames of the visual gesture;
      
      determining whether to apply a shape-based visual gesture recognition process or a motion-assisted visual gesture recognition process based on the analyzing of the plurality of the features;
      
      responsive to a determination of applying the shape-based visual gesture recognition process, applying the shape-based visual gesture recognition process to the plurality of the video frames capturing the visual gesture to recognize a shape gesture; and
      
      responsive to a determination of applying the motion-assisted visual gesture recognition process, applying the motion-assisted visual gesture recognition process to the plurality of the video frames capturing the visual gesture to recognize a motion gesture.

78. A non-transitory computer-readable storage medium storing executable computer program instructions for recognizing a visual gesture, the computer program instructions comprising code for:
- receiving a visual gesture formed by a part of a human body, the visual gesture being captured in a video having a plurality of video frames;
  
  determining a region of interest (ROI) in the plurality of video frames of the video based on motion vectors associated with the part of the human body, a centroid of the ROI aligned to be a centroid of a cluster of the motion vectors;
  
  applying different visual gesture recognition processes to the plurality of video frames in parallel;
  
  merging results of the different visual gesture recognition processes to recognize the visual gesture;
  
  determining variations in the centroid, shape, and size of an object within the ROI of the plurality of video frames, the centroid, shape, and size of the object changing according to motion of the object in the plurality of video frames in an affine motion model, wherein said determination of the variations in the centroid, shape and size of the object within the ROI is performed by a track-learning-detection-type (TLD-type) process, wherein the TLD-type process is a signal processing scheme in which following functions are performed simultaneously;
  
  object tracking, by use of motion estimation in the affine motion model, either using optical flow, or block-based motion estimation and employing estimation error metrics comprising a sum of absolute differences (SAD) and normalized correlation coefficient (NCC);
  
  object feature learning, which automatically learns features of objects within the ROI, the features including size, centroids, statistics and edges; and
  
  object detection comprising;
  
  feature extraction employing edge analysis, spatial transforms, and background subtraction,feature analysis employing clustering and vector quantization, andfeature matching employing signal matching using similarity metrics, neural networks, support vector machines, and maximum posteriori probability; and
  
  deriving three or more dimensional information and relationships of objects contained in the visual gesture from the plurality of video frames capturing the visual gesture based on the merged results of the different visual gesture recognition processes.
- View Dependent Claims (79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104)
- - 79. The computer-readable storage medium of claim 78, further comprising computer program instructions for:
    - presenting a plurality of visual gesture recognition types on a computer display.
  - 80. The computer-readable storage medium of claim 79, wherein the plurality of visual gesture types include a shaped-based gesture, a position-based gesture, a motion-assisted gesture and a mixed gesture.
  - 81. The computer-readable storage medium of claim 78, wherein a part of the human body forming the visual gesture is at least one of human hand, face and body.
  - 82. The computer-readable storage medium of claim 78, wherein the computer program instructions for applying the different visual gesture recognition processes to the plurality of video frames comprise computer program instructions for:
    - applying a shape-based gesture recognition process to the plurality of the video frames of the visual gesture.
  - 83. The computer-readable storage medium of claim 82, wherein the computer program instructions for applying the shape-based gesture recognition process to the plurality of the video frames of the visual gesture comprise computer program instructions for:
    - selecting a video frame of the plurality of the video frames of the video of the visual gesture as a reference frame, the video frame representing a digital color image of the visual gesture at a time instance;
      
      applying a general parametric model to the selected video frame of the visual gesture to generate a specific parametric template of the visual gesture;
      
      subsequently receiving one or more video frames of the video of the visual gesture, wherein the visual gesture in a subsequently received video frame is the visual gesture at a subsequent time instance;
      
      for each subsequently received video frame, detecting a visual contour of the visual gesture based at least in part on the specific parametric template of the visual gesture; and
      
      recognizing the visual gesture based at least in part on the detected visual contours.
  - 84. The computer-readable storage medium of claim 78, wherein the computer program instructions for applying the different visual gesture recognition processes to the plurality of video frames comprise computer program instructions for:
    - applying a motion-assisted gesture recognition process to the plurality of the video frames of the visual gesture.
  - 85. The computer-readable storage medium of claim 84, wherein the computer program instructions for applying the motion-assisted gesture recognition process to the plurality of the video frames of the visual gesture comprise computer program instructions for:
    - detecting motion of the object contained in the plurality of video frames capturing the visual gesture, the object being bounded in the ROI of a video frame of the plurality of video frames of the video; and
      
      recognizing the visual gesture based at least in part on the detected motion of the object.
  - 86. The computer-readable storage medium of claim 85, further comprising computer program instructions for:
    - tracking the motion of the object across a subset of the plurality of video frames;
      
      obtaining estimated motion parameters of the object based on the tracking;
      
      refining the estimated motion parameters of the object; and
      
      recognizing the visual gesture based at least in part on the detected motion of the object.
  - 87. The computer-readable storage medium of claim 86, wherein the computer program instructions for determining motion of the object comprise computer program instructions for:
    - identifying a first instance of the object contained in the ROI in the video frame; and
      
      obtaining a center point position of the ROI and a size of the ROI within the video frames.
  - 88. The computer-readable storage medium of claim 87, wherein the computer program instructions for obtaining estimated motion parameters of the object based on the tracking comprise computer program instructions for:
    - clustering the motion vectors representing the tracked motion of the object; and
      
      obtaining one or more estimates of the center point position of the ROI and the size of the ROI based on the clustering of the motion vectors.
  - 89. The computer-readable storage medium of claim 86, wherein the computer program instructions for tracking the motion of the object comprise computer program instructions for:
    - obtaining the motion vectors of the object, a motion vector of the object indicating a position of the object in the video frame relative to its corresponding position in a previous or reference video frame of the video.
  - 90. The computer-readable storage medium of claim 86, wherein the computer program instructions for refining the estimated motion parameters of the object comprise computer program instructions for:
    - selecting a plurality of candidate ROIs based on a local search centered around estimated position of the ROI;
      
      applying a similarity measure to the plurality of candidate ROIs of the object; and
      
      selecting at least one ROI based on the similarity measure.
  - 91. The computer-readable storage medium of claim 90, wherein the similarity measure is chosen from a group of sum of absolute difference, mean squared error and normalized correlation coefficient, and the selected ROI optimizes the similarity measure among the plurality of candidate ROIs.
  - 92. The computer-readable storage medium of claim 90, further comprising computer program instructions for adding another video frame as a second reference frame containing an estimated ROI, wherein the estimated ROI has a similarity score exceeding a threshold value.
  - 93. The computer-readable storage medium of claim 78, wherein the computer program instructions for applying the different visual gesture recognition processes to the plurality of video frames comprise computer program instructions for:
    - applying a position-based gesture recognition process to the plurality of the video frames of the visual gesture.
  - 94. The computer-readable storage medium of claim 93, wherein the computer program instructions for applying the position-based gesture recognition process to the plurality of the video frames of the visual gesture comprise computer program instructions for:
    - detecting changes of position of the object contained in the plurality of video frames capturing the visual gesture, a change of position of the object representing the visual gesture moving from a previous position to a current position; and
      
      recognizing the visual gesture based on the changes of position of the object.
  - 95. The computer-readable storage medium of claim 94, further comprising computer program instructions for:
    - recording the changes of position of the object;
      
      quantizing the changes of the position of the object; and
      
      recognizing the visual gesture based on quantized changes of position of the object.
  - 96. The computer-readable storage medium of claim 95, wherein the computer program instructions for recording the changes of position of the object comprise computer program instructions for:
    - recording a change of position of object by instantaneous position change of the object, wherein the instantaneous position change of the object indicates the change over an immediately previous position of the object.
  - 97. The computer-readable storage medium of claim 95, wherein the computer program instructions for recording the changes of position of the object comprise computer program instructions for:
    - recording a change of position of object by reference position change of the object, wherein the reference position change of the object indicates the change over a pre-defined reference position of the object.
  - 98. The computer-readable storage medium of claim 95, wherein the computer program instructions for recording the changes of position of the object further comprise computer program instructions for:
    - determining at least one characteristic of the visual gesture;
      
      selecting a method to record the changes of position of the object based on the determined characteristic of the visual gesture; and
      
      recording the changes of position of the object using the selected method.
  - 99. The computer-readable storage medium of claim 78, wherein the computer program instructions for applying the different visual gesture recognition processes comprise computer program instructions for:
    - applying a motion-assisted object tracking process and a secondary object tracking process in parallel to the plurality of video frames of the visual gesture.
  - 100. The computer-readable storage medium of claim 99, wherein the computer program instructions for tracking the object contained in the plurality of video frames capturing the visual gesture by the motion-assisted object tracking process and the secondary object tracking process in parallel comprise computer program instructions for:
    - obtaining a first set of estimates of a center point position of a the ROI containing the object and a size of the ROI based on the object tracking by the motion-assisted object tracking process; and
      
      obtaining a second set of estimates of the center point position of the ROI containing the object and the size of the ROI based on the object tracking by the secondary object tracking process.
  - 101. The computer-readable storage medium of claim 100, wherein the computer program instructions for merging the results of the different visual gesture recognition processes comprise computer program instructions for:
    - calculating a first confidence score for the motion-assisted object tracking process;
      
      calculating a second confidence score for the secondary object tracking process; and
      
      selecting the set of estimates based on comparison of the first confidence score and the second confidence score.
  - 102. The computer-readable storage medium of claim 100, further comprising computer program instructions for:
    - allocating computing resource to one tracking process selected from the motion-assisted object tracking process and the secondary object tracking process with a priority higher than the other tracking process; and
      
      responsive to total computer resource allocated to both the motion-assisted object tracking process and the secondary object tracking process exceeding a predetermined threshold, disabling one of the object tracking processes temporally.
  - 103. The computer-readable storage medium of claim 100, further comprising computer program instructions for:
    - responsive to total computer resource exceeding a predetermined threshold, scaling complexity of at least one of the object tracking processes temporally.
  - 104. The computer-readable storage medium of claim 78, wherein the computer program instructions for applying the different visual gesture recognition processes to the plurality of video frames comprise computer program instructions for:
    - analyzing a plurality of features extracted from the plurality of video frames of the visual gesture;
      
      determining whether to apply a shape-based visual gesture recognition process or a motion-assisted visual gesture recognition process based on the analyzing of the plurality of the features;
      
      responsive to a determination of applying the shape-based visual gesture recognition process, applying the shape-based visual gesture recognition process to the plurality of the video frames capturing the visual gesture to recognize a shape gesture; and
      
      responsive to a determination of applying the motion-assisted visual gesture recognition process, applying the motion-assisted visual gesture recognition process to the plurality of the video frames capturing the visual gesture to recognize a motion gesture.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
FastVDO LLC
Original Assignee
FastVDO LLC
Inventors
Dai, Wei, Krishnan, Madhu Peringassery, Topiwala, Pankaj
Primary Examiner(s)
Mengistu, Amare
Assistant Examiner(s)
Mathews, Crystal A

Application Number

US14/085,591
Publication Number

US 20140347263A1
Time in Patent Office

1,469 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/017   Gesture based interaction, ...

G06F 3/0482   Interaction with lists of s...

G06F 3/04842   Selection of displayed obje...

G06V 40/28   Recognition of hand or arm ...

Motion-assisted visual language for human computer interfaces

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

150 Citations

104 Claims

Specification

Solutions

Use Cases

Quick Links

Motion-assisted visual language for human computer interfaces

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

150 Citations

104 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links