Architecture for controlling a computer using hand gestures

US 20040189720A1
Filed: 03/25/2003
Published: 09/30/2004
Est. Priority Date: 03/25/2003
Status: Active Grant

First Claim

Patent Images

1. A system that facilitates a user interface, comprising:

a tracking component that detects at least one of a plurality of objects within a scene and tracks the respective object, detection of the object based at least in palt upon image comparison of a plurality of images relative to a course mapping of the images;

a seeding component that iteratively seeds the tracking component with object hypotheses based upon the presence of the object and the image comparison; and

a filtering component that selectively removes the tracked object from the object hypotheses and/or at least one object hypothesis from the set of object hypotheses, the tracked object removed based at least in part upon a region-based approach in determining depth to cursors and move windows.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Architecture for implementing a perceptual user interface. The architecture comprises alternative modalities for controlling computer application programs and manipulating on-screen objects through hand gestures or a combination of hand gestures and verbal commands. The perceptual user interface system includes a tracking component that detects object characteristics of at least one of a plurality of objects within a scene, and tracks the respective object. Detection of object characteristics is based at least in part upon image comparison of a plurality of images relative to a course mapping of the images. A seeding component iteratively seeds the tracking component with object hypotheses based upon the presence of the object characteristics and the image comparison. A filtering component selectively removes the tracked object from the object hypotheses and/or at least one object hypothesis from the set of object hypotheses based upon predetermined removal criteria.

444 Citations

70 Claims

1. A system that facilitates a user interface, comprising:
- a tracking component that detects at least one of a plurality of objects within a scene and tracks the respective object, detection of the object based at least in palt upon image comparison of a plurality of images relative to a course mapping of the images;
  
  a seeding component that iteratively seeds the tracking component with object hypotheses based upon the presence of the object and the image comparison; and
  
  a filtering component that selectively removes the tracked object from the object hypotheses and/or at least one object hypothesis from the set of object hypotheses, the tracked object removed based at least in part upon a region-based approach in determining depth to cursors and move windows.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The system of claim 1, the object including object characteristics that are at least one of object motion, object orientation, object features, and object rotation.
  - 3. The system of claim 1, the tracking component tracking the object only if motion of the object meets a minimum threshold when the object is present within the scene.
  - 4. The system of claim 1, the scene comprising a volume in space.
  - 5. The system of claim 1, the tracking component detecting the object utilizing an imaging system comprising at least one imager.
  - 6. The system of claim 1, the tracking component detecting the object utilizing an imaging system comprising at least two cameras whose rasters are in parallel.
  - 7. The system of claim 1, the object being a hand of at least one user.
  - 8. The system of claim 7, the hand controlled to provide at least one of a plurality of hand gestures that are detected and tracked by the tracking component.
  - 9. The system of claim 8, the hand gestures comprising gesture characteristics detectable by the tracking component, which gesture characteristics include at least one of hand movement, finger count, finger orientation, hand orientation, and hand rotation.
  - 10. The system of claim 1, the tracking component tracking the object in the scene by performing a frame comparison of frame images of the object over time.
  - 11. The system of claim 1, the tracking component predicting the next location of the object.
  - 12. The system of claim 1, the system processing device input data used in cooperation with the tracking component tracking the object.
  - 13. The system of claim 12, the device input data including keyboard input data, mouse input data, and audio input data.
  - 14. The system of claim 1, at least one of the tracking component and the filtering component including a support vector machine for classifying the object characteristics.
  - 15. The system of claim 1, further comprising a user interface component for presenting a graphical response to a tracked object.
  - 16. The system of claim 1, the graphical response presented is at least one of an icon and text.

17. A system that facilitates a user interface, comprising:
- means for tracking and detecting at least one of a plurality of objects within a scene based at least in part upon image comparison of a plurality of images relative to a course mapping of the images;
  
  means for iteratively seeding the tracking component with object hypotheses based upon the presence of the object characteristics and the image comparison; and
  
  means for filtering that selectively removes the tracked object from the object hypotheses and/or at least one object hypothesis from the set of object hypotheses, the tracked object removed based at least in part upon a region-based approach in determining depth to cursors and move windows.

18. A system that facilitates a user interface, comprising:
- a detecting component that detects at least one of a plurality of objects within a scene;
  
  a tracking component that tracks the detected object;
  
  a seeding component that iteratively seeds the tracking component with object hypotheses based upon the detected objects that are tracked;
  
  a filtering component that selectively removes the tracked object from the object hypotheses or at least one object hypothesis from the set of object hypotheses; and
  
  an interpreting component that interprets an object characteristic of the tracked object and executes a command in response thereto.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
- - 19. The system of claim 18, the detecting component detecting movement of the object based at least in part upon comparison of a first image frame to a second image frame, the second image frame being from a different point in time.
  - 20. The system of claim 19, the second image frame is successive to the first image frame.
  - 21. The system of claim 19, wherein the second image frame is more than one image frame in time subsequent to the first image frame.
  - 22. The system of claim 18, the detecting component detects movement of the object based at least in part upon perform1ing image comparison of a first image patch to a second image patch, the second image patch being from a different point in time than the first image patch, each image patch being a subset of an image frame, centered about a common point.
  - 23. The system of claim 22, the first and second image patches based upon a course mapping of an image frame.
  - 24. The system of claim 23, the course mapping based upon a regular grid.
  - 25. The system of claim 24, the regular grid is such that the first and second image patches are a predetermined number of pixels square.
  - 26. The system of claim 25, the first and second image patches spaced on sixteen pixel centers.
  - 27. The system of claim 22, the image comparison is based on a sum of the absolute differences (SAD) algorithm, define as
  - 28. The system of claim 18, the seeding component seeds the tracking component iteratively with the object hypotheses based at least in part on the presence of object motion.
  - 29. The system of claim 18, the seeding component iteratively seeds the tracking component with object hypotheses based upon the presence of object motion determined from a frame-to-frame comparison of video images or image patches.
  - 30. The system of claim 18, the tracking component comprising a prediction algorithm to predict a next position of a tracked object of an object hypothesis.
  - 31. The system of claim 30, the prediction algorithm comprising a Kalman filter.
  - 32. The system of claim 18, the tracking component selecting a narrow range of points centered on the predicted next position to which a second algorithm is applied to calculate an actual next position.
  - 33. The system of claim 32, the second algorithm determining an actual position by finding a location (u₁, v₁), within the narrow range that minimizes the sum of the absolute difference algorithm according to,SAD(I_t−
    - 1,u_t−
      
      1,v_t−
      
      1,I_t,u_t,v_t)where _t−
      
      1refers to an image at time t−
      
      1, and I_trefers to an image at time t, and where (u_t, v_t) refers to a location at time t.
  - 34. The system of claim 18, the filtering component based at least in part on a depth component that determines a depth for all objects within the scene.
  - 35. The system of claim 34, the depth component is determined using at least one of binocular disparity and triangulation.
  - 36. The system of claim 34, the depth component determines the depth only for objects that meet at least one of the object hypotheses.
  - 37. The system of claim 18, the filtering component based at least in part on removing an object positioned more than a predetermined depth value from the detecting component, which detecting component is at least one camera.
  - 38. The system of claim 18, the object detected based in part on a region-based approach that determines disparity at only those locations in an object image that correspond to the object hypotheses.
  - 39. The system of claim 18, the tracking component begins tracking the detected object based only on movement of the object.
  - 40. The system of claim 18, the seeding component seeding at least one object hypothesis to the detected object.
  - 41. The system of claim 18, the filtering component based at least in part on removing objects positioned outside of a volume in space in the scene.
  - 42. The system of claim 18, the filtering component removing one or more of the object hypotheses based at least in part on lack of motion of the corresponding tracked object for a predetermined duration of time.
  - 43. The system of claim 18, the filtering component removing one or more of the object hypotheses based at least in part on a first object which satisfies a first object hypothesis being within a predetermined distance to a second object which satisfies a second object hypothesis.
  - 44. The system of claim 18, the object hypotheses maintained by frame-to-frame tracking through time in one of two views and stereo matching across both views.
  - 45. The system of claim 18, the filtering component selectively removing all objects not located within the scene, and further selecting at least one remaining object that is closest to a camera.
  - 46. The system of claim 18, the plurality of objects including a dominant object and a non-dominant object that are used to describe a spatial relationship.
  - 47. The system of claim 18, further comprising the utilization of speech recognition to effect control of the system either separately or in combination with the detected object that is being tracked.
  - 48. The system of claim 18, the interpreting component executing a command that includes at least one of scroll, close, move, raise, and send to back.
  - 49. A computer according to the system of claim 18.

50. A computer system that facilitates interaction with a user, comprising:
- an object processing system for processing object information of one or more objects in a scene, the object processing system including;
  
  a tracking component that at least tracks the one or more objects;
  
  a seeding component that iteratively seeds the tracking component with object hypotheses based upon the objects that are tracked;
  
  a filtering component that selectively removes the tracked object and corresponding object hypothesis from a set of object hypotheses in accordance with predetermined criteria; and
  
  an interpreting component that interprets object characteristics of the tracked object and causes a command to be executed in response thereto;
  
  an input system for receiving user input separately or in combination with the object processing system; and
  
  a presentation system for presenting information to the user in response to at least one of the command being executed and receiving user input via the input system.
- View Dependent Claims (51, 52, 53, 54, 55, 56, 57)
- - 51. The computer systcm of claim 50, the presentation system comprising at least one of a plurality of displays for presenting the infonnation to the user and an audio system for presenting audio signals to the user.
  - 52. The computer system of claim 51, the object processing system tracking an object that is a hand gesture of the user, which hand gesture is interpreted by the interpretation component to cause a window in a first display of the plurality of displays to be moved to at least one of the remaining displays of the plurality of displays.
  - 53. The computer system of claim 50, the object processing system defining an engagement volume within the scene such that the engagement volume is automatically positioned proximate to the user.
  - 54. The computer system of claim 50, the engagement volume automatically positioned proximate to the user as the user roams.
  - 55. The computer system of claim 50, the object processing system defining an engagement volume within the scene such that the engagement volume automatically positioned according to predefined user preferences.
  - 56. A network of computer systems according to claim 50.
  - 57. The computer system of claim 50, the object processing system including a plurality of cameras for obtaining images of the one or more objects, the object hypotheses supported by frame-to-frame tracking through time and stereo matching of the images of the plurality of cameras.

58. A method of facilitating a human-computer interface, comprising:
- acquiring gesture characteristics of at least one of a plurality of gestures within a volume of space with an acquisition component, the gesture characteristics acquired based at least in part upon image comparison of a plurality of images relative to a course mapping of the images;
  
  iteratively seeding the acquisition component with at least one gesture hypothesis based upon the presence of the gesture characteristics in the volume of space and the image comparison; and
  
  automatically controlling a graphical representation of a graphical interface in response to acquiring the at least one gesture.
- View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66, 67)
- - 59. The method of claim 58, further comprising selectively removing the gesture hypothesis when the corresponding gesture moves outside the volume of space.
  - 60. The method of claim 58, further comprising selectively removing the gesture hypothesis when one of the corresponding gesture characteristics meet predetermined criteria.
  - 61. The method of claim 58, the acquisition component detecting the object utilizing an imaging system comprised of at least two cameras whose rasters are in parallel.
  - 62. The method of claim 58, further comprising processing device input signals in cooperation with the acquiring the gesture, which device input includes at least one of keyboard input, mouse input, and audio input.
  - 63. The method of claim 58, the graphical representation is at least an icon and text.
  - 64. The method of claim 58, the gestures including the use of two hands.
  - 65. The method of claim 58, each of the gestures uniquely associated with an executable command such that in response to acquiring the gesture, the command is automatically executed.
  - 66. The method of claim 65, the command associated with manipulating the graphical representation.
  - 67. The method of claim 66, the substantially simultaneous acquisition of the gesture and uttering of a voice signal effecting control of the graphical representation.

68. A method of facilitating a user interface, comprising:
- capturing an image of moving objects with a video source;
  
  determining whether one or more of the moving objects exist within the image;
  
  analyzing whether the one or more moving objects are within an engagement volume;
  
  calculating a distance from the video source to the moving objects within the engagement volume;
  
  selecting a closest moving object from one or more moving objects within the engagement volume;
  
  tracking the closest moving object;
  
  evaluating whether the closest moving object remains within the engagement volume;
  
  determining whether the closest moving object remains in motion;
  
  interpreting the motion of the closest moving object;
  
  determining whether the closest moving objects motion is a recognized command; and
  
  executing the recognized command to control a graphical representation of a graphical user interface.
- View Dependent Claims (69, 70)
- - 69. The method of claim 68, capturing of the image further comprising:
    - acquiring at least two video images with the video source;
      
      selecting a first video patch from a first video image of the at least two video images and selecting a successive video patch from a successive video image of the at least two video images;
      
      comparing the successive video patch to the first video patch using a SAD algorithm to generate a SAD value; and
      
      determining whether the SAD value is greater than a motion threshold level.
  - 70. The method of claim 68, tracking of the closet moving object further comprising:
    - identifying a current position of the closest moving object;
      
      predicting a next position of the closest moving object;
      
      selecting a narrow range of pixels centered on the predicted next position of the closest moving object;
      
      calculating a SAD value for a video patch associated with a point in the narrow range;
      
      repeating the calculation of the SAD value for video patches associated with all other points in the narrow range;
      
      determining the point associated with the video patch which has a minimum SAD value; and
      
      updating the current position of the closest moving object to be the point associated with the video patch which has the minimum SAD value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Oliver, Nuria M., Wilson, Andrew D.

Granted Patent

US 7,665,041 B2
Time in Patent Office

Days
Field of Search
US Class Current

715/863
CPC Class Codes

G06F 13/105   where the programme perform...

G06F 3/017   Gesture based interaction, ...

G06F 3/023   Arrangements for converting...

G06F 3/038   Control and interface arran...

G06V 40/28   Recognition of hand or arm ...

Architecture for controlling a computer using hand gestures

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

444 Citations

70 Claims

Specification

Solutions

Use Cases

Quick Links

Architecture for controlling a computer using hand gestures

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

444 Citations

70 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links