Multi media computing or entertainment system for responding to user presence and activity
First Claim
1. A non-transitory program storage device, readable by a processor and comprising instructions stored thereon to cause one or more processors to:
- acquire a depth image of a scene in a vicinity of a device, the first depth image having a first plurality of pixels, each pixel having a value indicative of a distance;
store the depth image in a memory;
develop a scene geometry based upon the depth image;
determine that a user is engaging the device;
identify a human hand in a region of space based on the values indicative of the distances of the first plurality of pixels;
identify a three-dimensional region about the human hand, wherein the three-dimensional region includes at least some of the first plurality of pixels;
partition the three-dimensional region about the human hand into a second plurality of sub-regions, each sub-region having a corresponding value and size, wherein the value of a particular sub-region comprises a number of human hand pixels within the particular sub-region, wherein the sizes of the sub-regions are configured so that the number of human hand pixels within each sub-region is approximately equal, and wherein the sizes of the sub-regions are non-uniform;
generate a feature vector for the human hand based on the values of the second plurality of sub-regions;
apply the feature vector to a classifier;
determine that the human hand is making an identified gesture based on output from the classifier; and
cause an action to be taken by the device, based, at least in part, upon the identified gesture and the scene geometry.
1 Assignment
0 Petitions
Accused Products
Abstract
Varying embodiments of intelligent systems are disclosed that respond to user intent and desires based upon activity that may or may not be expressly directed at the intelligent system. In some embodiments, the intelligent system acquires a depth image of a scene surrounding the system. A scene geometry may be extracted from the depth image and elements of the scene, such as walls, furniture, and humans may be evaluated and monitored. In certain embodiments, user activity in the scene is monitored and analyzed to infer user desires or intent with respect to the system. The interpretation of the user'"'"'s intent or desire as well as the system'"'"'s response may be affected by the scene geometry surrounding the user and/or the system. In some embodiments, techniques and systems are disclosed for interpreting express user communication, for example, expressed through fine hand gesture movements. In some embodiments, such gesture movements may be interpreted based on real-time depth information obtained from, for example, optical or non-optical type depth sensors. The depth information may be interpreted in “slices” (three-dimensional regions of space having a relatively small depth) until one or more candidate hand structures are detected. Once detected, each candidate hand structure may be confirmed or rejected based on its own unique physical properties (e.g., shape, size and continuity to an arm structure). Each confirmed hand structure may be submitted to a depth-aware filtering process before its own unique three-dimensional features are quantified into a high-dimensional feature vector. A two-step classification scheme may be applied to the feature vectors to identify a candidate gesture (step 1), and to reject candidate gestures that do not meet a gesture-specific identification operation (step-2). The identified gesture may be used to initiate some action controlled by a computer system.
38 Citations
20 Claims
-
1. A non-transitory program storage device, readable by a processor and comprising instructions stored thereon to cause one or more processors to:
-
acquire a depth image of a scene in a vicinity of a device, the first depth image having a first plurality of pixels, each pixel having a value indicative of a distance; store the depth image in a memory; develop a scene geometry based upon the depth image; determine that a user is engaging the device; identify a human hand in a region of space based on the values indicative of the distances of the first plurality of pixels; identify a three-dimensional region about the human hand, wherein the three-dimensional region includes at least some of the first plurality of pixels; partition the three-dimensional region about the human hand into a second plurality of sub-regions, each sub-region having a corresponding value and size, wherein the value of a particular sub-region comprises a number of human hand pixels within the particular sub-region, wherein the sizes of the sub-regions are configured so that the number of human hand pixels within each sub-region is approximately equal, and wherein the sizes of the sub-regions are non-uniform; generate a feature vector for the human hand based on the values of the second plurality of sub-regions; apply the feature vector to a classifier; determine that the human hand is making an identified gesture based on output from the classifier; and cause an action to be taken by the device, based, at least in part, upon the identified gesture and the scene geometry. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method comprising:
-
acquiring a depth image of a scene in a vicinity of a device, the depth image having a first plurality of pixels, each pixel having a value indicative of a distance; storing the depth image in a memory; developing a scene geometry based upon the depth image; determining that a user is engaging the device; identifying a human hand in a region of space based on the values indicative of the distances of the first plurality of pixels; identifying a three-dimensional region about the human hand, wherein the three-dimensional region includes at least some of the first plurality of pixels; partitioning the three-dimensional region about the human hand into a second plurality of sub-regions, each sub-region having a corresponding value and size, wherein the value of a particular sub-region comprises a number of human hand pixels within the particular sub-region, wherein the sizes of the sub-regions are configured so that the number of human hand pixels within each sub-region is approximately equal, and wherein the sizes of the sub-regions are non-uniform; generating a feature vector for the human hand based on the values of the second plurality of sub-regions; applying the feature vector to a classifier; determining that the human hand is making an identified gesture based on output from the classifier; and causing an action to be taken by the device, based, at least in part, upon the identified gesture and the scene geometry. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. An electronic device comprising:
-
a memory; a depth sensor; one or more processors, communicatively coupled to the memory, wherein the memory stores instructions to cause the one or more processors to; acquire a depth image of a scene in a vicinity of the electronic device using the depth sensor, the depth image having a first plurality of pixels, each pixel having a value indicative of a distance; store the depth image in the memory; develop a scene geometry based upon the depth image; determine that a user is engaging the electronic device; identify a human hand in a region of space based on the first plurality of values; identify a three-dimensional region about the human hand, wherein the three-dimensional region includes at least some of the first plurality of pixels; partition the three-dimensional region about the human hand into a second plurality of sub-regions, each sub-region having a corresponding value and size, wherein the value of a particular sub-region comprises a number of human hand pixels within the particular sub-region, wherein the sizes of the sub-regions are configured so that the number of human hand pixels within each sub-region is approximately equal, and wherein the sizes of the sub-regions are non-uniform; generate a feature vector for the human hand based on the values of the second plurality of sub-regions; apply the feature vector to a classifier; determine that the human hand is making an identified gesture based on output from the classifier; and cause an action to be taken by the electronic device, based, at least in part, upon the identified gesture and the scene geometry. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification