Method and apparatus for real-time gesture recognition

US 6,072,494 A
Filed: 10/15/1997
Issued: 06/06/2000
Est. Priority Date: 10/15/1997
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented method of storing and recognizing gestures made by a moving subject within an image, the method including:

a) building a background model by obtaining at least one frame of an image;

b) obtaining a data frame containing a subject performing part of a gesture;

c) analyzing the data frame thereby determining particular coordinates of the subject at a particular time while the subject is performing the gesture;

d) adding the particular coordinates to a frame data set;

e) examining the particular coordinates such that the particular coordinates are compared to positional data making up a plurality of recognizable gestures, wherein a recognizable gesture is made up of at least one dimension such that the positional data describes dimensions of the recognized gesture;

f) repeating b through e for a plurality of data frames; and

g) determining whether the plurality of the data frames when examined in a particular sequence, conveys a subject gesture by the subject that resembles a recognizable gesture, thereby causing an operation based on a predetermined meaning of the recognizable gesture be performed by a computer.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are disclosed for providing a gesture recognition system for recognizing gestures made by a moving subject within an image and performing an operation based on the semantic meaning of the gesture. A subject, such as a human being, enters the viewing field of a camera connected to a computer and performs a gesture, such as flapping of the arms. The gesture is then examined by the system one image frame at a time. Positional data is derived from the input frames and compared to data representing gestures already known to the system. The comparisons are done in real-time and the system can be trained to better recognize known gestures or to recognize new gestures. A frame of the input image containing the subject is obtained after a background image model has been created. An input frame is used to derive a frame data set that contains particular coordinates of the subject at a given moment in time. This series of frame data sets is examined to determine whether it conveys a gesture that is known to the system. If the subject gesture is recognizable to the system, an operation based on the semantic meaning of the gesture can be performed by a computer.

Citations

58 Claims

1. A computer-implemented method of storing and recognizing gestures made by a moving subject within an image, the method including:
- a) building a background model by obtaining at least one frame of an image;
  
  b) obtaining a data frame containing a subject performing part of a gesture;
  
  c) analyzing the data frame thereby determining particular coordinates of the subject at a particular time while the subject is performing the gesture;
  
  d) adding the particular coordinates to a frame data set;
  
  e) examining the particular coordinates such that the particular coordinates are compared to positional data making up a plurality of recognizable gestures, wherein a recognizable gesture is made up of at least one dimension such that the positional data describes dimensions of the recognized gesture;
  
  f) repeating b through e for a plurality of data frames; and
  
  g) determining whether the plurality of the data frames when examined in a particular sequence, conveys a subject gesture by the subject that resembles a recognizable gesture, thereby causing an operation based on a predetermined meaning of the recognizable gesture be performed by a computer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. A method as recited in claim 1 wherein building a background model further includes determining whether there is significant activity in the background image thereby restarting the process for building the background model.
  - 3. A method as recited in claim 1 wherein obtaining a data frame further includes separating the subject in the data frame into a plurality of identifiable parts wherein an identifiable part is assigned particular coordinates.
  - 4. A method as recited in claim 1 wherein a dimension represents a movement space and has at least one key point wherein a key point represents a significant direction change.
  - 5. A method as recited in claim 4 wherein analyzing the data frame further includes:
    - determining a certainty score at a key point in a dimension wherein the certainty score represents a probability that another key point in the dimension has been reached; and
      
      using the certainty score at each key point to determine whether a subject gesture matches a recognizable gesture.
  - 6. A method as recited in claim 1 wherein determining whether the plurality of the data frames conveys a subject gesture further includes comparing the frame data set to positional data corresponding to a dimensional pattern for a recognizable gesture.
  - 7. A method as recited in claim 1 further including:
    - obtaining a next data frame thereby determining whether the subject gesture has reached a next key point; and
      
      updating a status report containing data on key points reached in a dimension.
  - 8. A method as recited in claim 7 further including checking the status report to determine if the subject gesture is a partial completion of a recognizable gesture by comparing a previous data frame to the positional data for a recognizable gesture and determining how many key points have been reached.
  - 9. A method as recited in claim 1 further including:
    - determining whether the particular coordinates in the frame data set match the positional data for a potential gesture;
      
      resetting a data array representative of the potential gesture and resetting a status report if the particular coordinates in the frame data set severely mismatch the positional data making up a plurality of recognizable gestures;
      
      discarding data in the data array representative of the potential gesture if the particular coordinates in the frame data set mismatch the positional data to a degree lesser than a severe mismatch; and
      
      signaling if the particular coordinates in the frame data set match positional data for a recognizable gesture thereby indicating that requirements for a recognizable gesture have been met.
  - 10. A method as recited in claim 9 further including discarding the data in the data array representative of the potential gesture if a predetermined amount of time has passed.
  - 11. A method as recited in claim 1 wherein the step of examining the particular coordinates further includes extracting data from the frame data set based on characteristics of the recognizable gesture being checked.
  - 12. A method as recited in claim 1 wherein the recognizable gesture that matches the subject gesture first is the recognizable gesture that causes an operation to be performed in a computer.
  - 13. A method as recited in claim 1 wherein adding the particular coordinates to a frame data set further includes storing the frame data set in a plurality of arrays wherein an array corresponds to one dimension for each recognizable gesture.
  - 14. A method as recited in claim 1 further including:
    - storing a plurality of samples of a subject gesture;
      
      inputting a number of key points that fit in the subject gesture and a time value representing the time for the subject gesture to complete;
      
      inputting a number of dimensions of the subject gesture;
      
      determining locations of key points in a model representative of the subject gesture; and
      
      calculating a probability distribution for key points indicating the likelihood that a certain output will be observed.
  - 15. A method as recited in claim 14 further including refining the model such that the plurality of samples of the subject gesture fit within the model.
  - 16. A method as recited in claim 14 further including calculating a confusion matrix wherein the subject gesture is compared with previously stored recognizable gestures so that similarities between the new gesture to previously stored recognizable gestures can be determined.
  - 17. A method as recited in claim 1 further including pre-processing the data frame such that the subject is visually displayed on a computer display monitor.
  - 18. A method as recited in claim 17 wherein the subject is composited onto a destination image such that the background image is subtracted from the data frame thereby isolating the subject to be composited.
  - 19. A computer readable medium including program instructions implementing the process of claim 1.

20. A computer-implemented system for storing and recognizing gestures made by a moving subject within an image, the system comprising:
- an image modeller for creating a background model by examining a plurality of frames of an input image that does not contain a subject;
  
  a frame capturer for obtaining a data frame containing the subject performing part of a subject gesture;
  
  a frame analyzer for analyzing the data frame thereby determining relevant coordinates of the subject at a particular time while the subject is performing the subject gesture;
  
  a data set creator for creating a frame data set by collecting the relevant coordinates;
  
  a data set analyzer for examining the particular coordinates in the frame data set such that the particular coordinates are compared to positional data making up a plurality of recognizable gestures, wherein each recognizable gesture is made up of at least one dimension such that the positional data describes dimensions of the recognized gesture; and
  
  a gesture recognizer for determining whether a plurality of the data frames, wherein a data frame is represented by a frame data set, when examined in a particular sequence, conveys a gesture by the subject that resembles a recognizable gesture, thereby causing an operation based on a predetermined meaning of the recognizable gesture be performed by a computer.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 21. A system as recited in claim 20 wherein the image modeller further comprises an image initializer for initializing the input image that does not contain the subject.
  - 22. A system as recited in claim 20 wherein the frame capturer further comprises a frame separator for categorizing the subject represented in the data frame into a plurality of identifiable parts wherein an identifiable part is assigned particular coordinates.
  - 23. A system as recited in claim 20 wherein a dimension represents a movement space and has at least one key point wherein a key point represents a significant direction change.
  - 24. A system as recited in claim 23 wherein the frame analyzer further comprises:
    - a probability evaluator for determining a certainty score at a key point in a dimension wherein the certainty score represents a probability that a sequence of outputs observed belongs to a gesture model; and
      
      a gesture recognizer for determining whether a subject gesture matches a recognizable gesture by using the certainty score at each key point.
  - 25. A system as recited in claim 20 wherein the gesture recognizer further comprises a data comparator for comparing the frame data set to positional data corresponding to a dimension of a recognizable gesture.
  - 26. A system as recited in claim 20 further comprising a status updater for updating a status report containing data on key points reached in a dimension after obtaining a next data frame thereby determining whether the subject gesture has reached a next key point.
  - 27. A system as recited in claim 26 further comprising a status checker for checking the status report to determine if the subject gesture is a partial completion of a recognizable gesture by comparing a previous data frame to the positional data for a recognizable gesture and determining how many key points have been reached.
  - 28. A system as recited in claim 20 further comprising:
    - a position comparator for determining whether the particular coordinates in the frame data set match the positional data for a potential gesture;
      
      a data resetter for resetting a data array representative of the potential gesture and resetting a status report if the particular coordinates in the frame data set severely mismatch the positional data making up a plurality of recognizable gestures;
      
      a data discarder for discarding data in the data array representative of the potential gesture if the particular coordinates in the frame data set mismatch the positional data to a degree lesser than a severe mismatch; and
      
      a match indicator for signaling if the particular coordinates in the frame data set match positional data for a recognizable gesture thereby indicating that requirements for a recognizable gesture have been met.
  - 29. A system as recited in claim 28 wherein the data discarder further comprising a timer for discarding the data in the data array representative of the potential gesture if a predetermined amount of time has passed.
  - 30. A system as recited in claim 20 wherein the data set analyzer further comprises a data extractor for extracting data from the frame data set based on characteristics of the recognizable gesture being checked.
  - 31. A system as recited in claim 20 wherein the recognizable gesture that matches the subject gesture first is the recognizable gesture that causes an operation to be performed in a computer.
  - 32. A system as recited in claim 20 wherein the data set creator further comprises a data set allocator for storing the frame data set in a plurality of arrays wherein an array corresponds to one dimension for a recognizable gesture.
  - 33. A system as recited in claim 20 further comprising:
    - a sample receiver for storing a plurality of samples of a subject gesture;
      
      a gesture data intaker for accepting a plurality of key points that fits in the subject gesture, a time value representing the time for the subject gesture to complete and a plurality of dimensions of the subject gesture;
      
      a key point locator for determining locations of key points in a model representative of the subject gesture; and
      
      a probability evaluator for calculating a probability distribution at the key points indicating the likelihood of observing a particular output.
  - 34. A system as recited in claim 33 further including refining the model such that the plurality of samples of the subject gesture fit within the model.
  - 35. A system as recited in claim 33 further comprising a gesture confusion evaluator for calculating a confusion matrix wherein the subject gesture is compared with previously stored recognizable gestures so that similarities between the subject gesture to previously stored recognizable gestures can be determined.
  - 36. A system as recited in claim 20 further comprising a data frame processor for pre-processing the data frame such that the subject is visually displayed on a computer display monitor.
  - 37. A system as recited in claim 36 further comprising a subject compositor for compositing the subject onto a destination image such that the background image is subtracted from the data frame thereby isolating the subject to be composited.

38. A computer-implemented system for storing and recognizing gestures made by a moving subject within an image, the system comprising:
- means for building a background model by obtaining at least one frame of an image;
  
  means for obtaining a data frame containing a subject performing a part of a subject gesture;
  
  means for analyzing the data frame thereby determining particular coordinates of the subject at a particular time while the subject is performing the subject gesture;
  
  means for adding the particular coordinates to a frame data set;
  
  means for examining the particular coordinates such that the particular coordinates are compared to positional data making up a plurality of recognizable gestures, wherein a recognizable gesture is made up of at least one dimension such that the positional data describes dimensions of the recognized gesture; and
  
  means for determining whether a plurality of data frames, where a data frame is represented by the frame data set, when examined in a particular sequence, conveys a subject gesture by the subject that resembles a recognizable gesture, thereby causing an operation based on a predetermined meaning of the recognizable gesture be performed by a computer.
- View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55)
- - 39. A system as recited in claim 38 wherein means for building a background model further includes means for determining whether there is significant activity in the background image thereby restarting the process for building the background model.
  - 40. A system as recited in claim 38 wherein means for obtaining a data frame further includes means for separating the subject in the data frame into a plurality of identifiable parts wherein an identifiable part is assigned particular coordinates.
  - 41. A system as recited in claim 38 wherein a dimension represents a movement space and has at least one key point wherein a key point represents a significant direction change.
  - 42. A system as recited in claim 38 wherein means for analyzing the data frame further includes:
    - means for determining a certainty score at a key point in a dimension wherein the certainty score represents a probability that a sequence of previous data frames fit a gesture model; and
      
      means for using the certainty score at each key point to determine whether a subject gesture matches a recognizable gesture.
  - 43. A system as recited in claim 38 wherein means for determining whether the plurality of the data frames conveys a subject gesture further includes means for comparing the frame data set to positional data corresponding to a dimension for a recognizable gesture.
  - 44. A system as recited in claim 38 further including:
    - means for obtaining a next data frame thereby determining whether the subject gesture has reached a next key point; and
      
      means for updating a status report containing data on key points reached in a dimension.
  - 45. A system as recited in claim 44 further including means for checking the status report to determine if a subject gesture is a partial completion of a recognizable gesture by comparing a previous data frame to the positional data for a recognizable gesture and determining how many key points have been reached.
  - 46. A system as recited in claim 38 further including:
    - means for determining whether the particular coordinates in the frame data set match the positional data for a potential gesture;
      
      means for resetting a data array representative of the potential gesture and resetting a status report if the particular coordinates in the frame data set severely mismatch the positional data making up a plurality of recognizable gestures;
      
      means for discarding data in the data array representative of the potential gesture if the particular coordinates in the frame data set mismatch the positional data to a degree lesser than a severe mismatch; and
      
      means for signaling if the particular coordinates in the frame data set match positional data for a recognizable gesture thereby indicating that requirements for a recognizable gesture have been met.
  - 47. A system as recited in claim 46 further including means for discarding the data in the data array representative of the potential gesture if a predetermined amount of time has passed.
  - 48. A system as recited in claim 38 wherein means for examining the particular coordinates further includes means for extracting data from the frame data set based on characteristics of the recognizable gesture being checked.
  - 49. A system as recited in claim 38 wherein the recognizable gesture that matches the subject gesture first is the recognizable gesture that causes an operation to be performed in a computer.
  - 50. A system as recited in claim 38 wherein means for adding the particular coordinates to a frame data set further includes means for storing the frame data set in a plurality of arrays wherein an array corresponds to one dimension for each recognizable gesture.
  - 51. A system as recited in claim 38 further including:
    - means for storing a plurality of samples of a subject gesture;
      
      means for inputting a number of key points that fit in the gesture and a time value representing the time for the subject gesture to complete;
      
      means for inputting a number of dimensions of the subject gesture;
      
      means for determining locations of key points in a model representative of the subject gesture; and
      
      means for calculating a probability distribution for key points indicating the likelihood of observing a particular output.
  - 52. A system as recited in claim 51 further including means for refining the model such that the plurality of samples of the subject gesture fit within the model.
  - 53. A system as recited in claim 51 further including means for calculating a confusion matrix wherein the subject gesture is compared with previously stored recognizable gestures so that similarities between the subject gesture to previously stored recognizable gestures can be determined.
  - 54. A system as recited in claim 38 further including means for preprocessing the data frame such that the subject is visually displayed on a computer display monitor.
  - 55. A system as recited in claim 54 wherein the subject is composited onto a destination image such that the background image is subtracted from the data frame thereby isolating the subject to be composited.

56. A computer-implemented method of storing and recognizing gestures made by a moving subject within an image, the method including:
- a) building a background model by obtaining at least one frame of an image including determining whether there is significant activity in the background image thereby restarting the process for building the background model;
  
  b) obtaining a data frame containing a subject performing part of a gesture including separating the subject in the data frame into a plurality of identifiable parts wherein an identifiable part is assigned particular coordinates;
  
  c) analyzing the data frame thereby determining particular coordinates of the subject at a particular time while the subject is performing the gesture;
  
  d) adding the particular coordinates to a frame data set;
  
  e) examining the particular coordinates such that the particular coordinates are compared to positional data making up a plurality of recognizable gestures, wherein a recognizable gesture is made up of at least one dimension such that the positional data describes dimensions of the recognized gesture;
  
  f) repeating b through e for a plurality of data frames;
  
  g) determining whether the plurality of the data frames when examined in a particular sequence, conveys a subject gesture by the subject that resembles a recognizable gesture, thereby causing an operation based on a predetermined meaning of the recognizable gesture be performed by a computer;
  
  h) storing a plurality of samples of a subject gesture;
  
  i) inputting a number of key points that fit in the subject gesture and a time value representing the time for the subject gesture to complete;
  
  j) inputting a number of dimensions of the subject gesture;
  
  k) determining locations of key points in a model representative of the subject gesture;
  
  l) calculating a probability distribution for key points indicating the likelihood that a certain output will be observed; and
  
  m) refining the model such that the plurality of samples of the subject gesture fit within the model.

57. A computer-implemented system for storing and recognizing gestures made by a moving subject within an image, the system comprising:
- an image modeller for creating a background model by examining a plurality of frames of an input image that does not contain a subject comprising an image initializer for initializing the input image that does not contain the subject;
  
  a frame capturer for obtaining a data frame containing the subject performing part of a subject gesture comprising a frame separator for categorizing the subject represented in the data frame into a plurality of identifiable parts wherein an identifiable part is assigned particular coordinates;
  
  a frame analyzer for analyzing the data frame thereby determining relevant coordinates of the subject at a particular time while the subject is performing the subject gesture;
  
  a data set creator for creating a frame data set by collecting the relevant coordinates;
  
  a data set analyzer for examining the particular coordinates in the frame data set such that the particular coordinates are compared to positional data making up a plurality of recognizable gestures, wherein each recognizable gesture is made up of at least one dimension such that the positional data describes dimensions of the recognized gesture;
  
  a gesture recognizer for determining whether a plurality of the data frames, wherein a data frame is represented by a frame data set, when examined in a particular sequence, conveys a gesture by the subject that resembles a recognizable gesture, thereby causing an operation based on a predetermined meaning of the recognizable gesture be performed by a computer;
  
  a sample receiver for storing a plurality of samples of a subject gesture;
  
  a gesture data intaker for accepting a plurality of key points that fits in the subject gesture, a time value representing the time for the subject gesture to complete and a plurality of dimensions of the subject gesture;
  
  a key point locator for determining locations of key points in a model representative of the subject gesture;
  
  a probability evaluator for calculating a probability distribution at the key points indicating the likelihood of observing a particular output; and
  
  a model refiner for refining the model such that the plurality of samples of the subject gesture fit within the model.

58. A computer-implemented system for storing and recognizing gestures made by a moving subject within an image, the system comprising:
- means for building a background model by obtaining at least one frame of an image including means for determining whether there is significant activity in the background image thereby restarting the process for building the background model;
  
  means for obtaining a data frame containing a subject performing a part of a subject gesture including means for separating the subject in the data frame into a plurality of identifiable parts wherein an identifiable part is assigned particular coordinates;
  
  means for analyzing the data frame thereby determining particular coordinates of the subject at a particular time while the subject is performing the subject gesture;
  
  means for adding the particular coordinates to a frame data set;
  
  means for examining the particular coordinates such that the particular coordinates are compared to positional data making up a plurality of recognizable gestures, wherein a recognizable gesture is made up of at least one dimension such that the positional data describes dimensions of the recognized gesture;
  
  means for determining whether a plurality of data frames, where a data frame is represented by the frame data set, when examined in a particular sequence, conveys a subject gesture by the subject that resembles a recognizable gesture, thereby causing an operation based on a predetermined meaning of the recognizable gesture be performed by a computer;
  
  means for storing a plurality of samples of a subject gesture;
  
  means for inputting a number of key points that fit in the gesture and a time value representing the time for the subject gesture to complete;
  
  means for inputting a number of dimensions of the subject gesture;
  
  means for determining locations of key points in a model representative of the subject gesture;
  
  means for calculating a probability distribution for key points indicating the likelihood of observing a particular output; and
  
  means for refining the model such that the plurality of samples of the subject gesture fit within the model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Planet Electric Incorporated
Inventors
Nguyen, Katerina H.
Primary Examiner(s)
Bayerl, Raymond J.
Assistant Examiner(s)
Thai, Cuong T.

Application Number

US08/951,070
Time in Patent Office

965 Days
Field of Search

345/327, 345/358, 345/156, 345/326, 382/107, 382/209, 382/218, 382/217
US Class Current

715/863
CPC Class Codes

G06F 3/017   Gesture based interaction, ...

G06F 3/0304   Detection arrangements usin...

G06V 40/20   Movements or behaviour, e.g...

Method and apparatus for real-time gesture recognition

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

58 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for real-time gesture recognition

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

58 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links