Multi media computing or entertainment system for responding to user presence and activity

US 10,048,765 B2
Filed: 09/25/2015
Issued: 08/14/2018
Est. Priority Date: 09/25/2015
Status: Active Grant

First Claim

Patent Images

1. A non-transitory program storage device, readable by a processor and comprising instructions stored thereon to cause one or more processors to:

acquire a depth image of a scene in a vicinity of a device, the first depth image having a first plurality of pixels, each pixel having a value indicative of a distance;

store the depth image in a memory;

develop a scene geometry based upon the depth image;

determine that a user is engaging the device;

identify a human hand in a region of space based on the values indicative of the distances of the first plurality of pixels;

identify a three-dimensional region about the human hand, wherein the three-dimensional region includes at least some of the first plurality of pixels;

partition the three-dimensional region about the human hand into a second plurality of sub-regions, each sub-region having a corresponding value and size, wherein the value of a particular sub-region comprises a number of human hand pixels within the particular sub-region, wherein the sizes of the sub-regions are configured so that the number of human hand pixels within each sub-region is approximately equal, and wherein the sizes of the sub-regions are non-uniform;

generate a feature vector for the human hand based on the values of the second plurality of sub-regions;

apply the feature vector to a classifier;

determine that the human hand is making an identified gesture based on output from the classifier; and

cause an action to be taken by the device, based, at least in part, upon the identified gesture and the scene geometry.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Varying embodiments of intelligent systems are disclosed that respond to user intent and desires based upon activity that may or may not be expressly directed at the intelligent system. In some embodiments, the intelligent system acquires a depth image of a scene surrounding the system. A scene geometry may be extracted from the depth image and elements of the scene, such as walls, furniture, and humans may be evaluated and monitored. In certain embodiments, user activity in the scene is monitored and analyzed to infer user desires or intent with respect to the system. The interpretation of the user'"'"'s intent or desire as well as the system'"'"'s response may be affected by the scene geometry surrounding the user and/or the system. In some embodiments, techniques and systems are disclosed for interpreting express user communication, for example, expressed through fine hand gesture movements. In some embodiments, such gesture movements may be interpreted based on real-time depth information obtained from, for example, optical or non-optical type depth sensors. The depth information may be interpreted in “slices” (three-dimensional regions of space having a relatively small depth) until one or more candidate hand structures are detected. Once detected, each candidate hand structure may be confirmed or rejected based on its own unique physical properties (e.g., shape, size and continuity to an arm structure). Each confirmed hand structure may be submitted to a depth-aware filtering process before its own unique three-dimensional features are quantified into a high-dimensional feature vector. A two-step classification scheme may be applied to the feature vectors to identify a candidate gesture (step 1), and to reject candidate gestures that do not meet a gesture-specific identification operation (step-2). The identified gesture may be used to initiate some action controlled by a computer system.

38 Citations

View as Search Results

20 Claims

1. A non-transitory program storage device, readable by a processor and comprising instructions stored thereon to cause one or more processors to:
- acquire a depth image of a scene in a vicinity of a device, the first depth image having a first plurality of pixels, each pixel having a value indicative of a distance;
  
  store the depth image in a memory;
  
  develop a scene geometry based upon the depth image;
  
  determine that a user is engaging the device;
  
  identify a human hand in a region of space based on the values indicative of the distances of the first plurality of pixels;
  
  identify a three-dimensional region about the human hand, wherein the three-dimensional region includes at least some of the first plurality of pixels;
  
  partition the three-dimensional region about the human hand into a second plurality of sub-regions, each sub-region having a corresponding value and size, wherein the value of a particular sub-region comprises a number of human hand pixels within the particular sub-region, wherein the sizes of the sub-regions are configured so that the number of human hand pixels within each sub-region is approximately equal, and wherein the sizes of the sub-regions are non-uniform;
  
  generate a feature vector for the human hand based on the values of the second plurality of sub-regions;
  
  apply the feature vector to a classifier;
  
  determine that the human hand is making an identified gesture based on output from the classifier; and
  
  cause an action to be taken by the device, based, at least in part, upon the identified gesture and the scene geometry.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The non-transitory program storage device of claim 1, wherein the instructions further cause the one or more processors to monitor and analyze body or limb activity of the user.
  - 3. The non-transitory program storage device of claim 1, wherein the instructions further cause the one or more processors to monitor and analyze body or limb activity of a second user.
  - 4. The non-transitory program storage device of claim 1, wherein the instructions to cause the one or more processors to determine that a user is engaging the device include instructions to cause the one or more processors to use of one of the following sensors:
    - an RGB sensor or a camera.
  - 5. The non-transitory program storage device of claim 1, wherein the instructions to cause the one or more processors to identify a human hand comprise instructions to cause the one or more processors to use connected component analysis to identify one or more connected groups of pixels within the depth image.
  - 6. The non-transitory program storage device of claim 1, wherein the three-dimensional region about the human hand comprises those pixels from the first plurality of pixels corresponding to a spatial location that is within a specified distance from the human hand.

7. A method comprising:
- acquiring a depth image of a scene in a vicinity of a device, the depth image having a first plurality of pixels, each pixel having a value indicative of a distance;
  
  storing the depth image in a memory;
  
  developing a scene geometry based upon the depth image;
  
  determining that a user is engaging the device;
  
  identifying a human hand in a region of space based on the values indicative of the distances of the first plurality of pixels;
  
  identifying a three-dimensional region about the human hand, wherein the three-dimensional region includes at least some of the first plurality of pixels;
  
  partitioning the three-dimensional region about the human hand into a second plurality of sub-regions, each sub-region having a corresponding value and size, wherein the value of a particular sub-region comprises a number of human hand pixels within the particular sub-region, wherein the sizes of the sub-regions are configured so that the number of human hand pixels within each sub-region is approximately equal, and wherein the sizes of the sub-regions are non-uniform;
  
  generating a feature vector for the human hand based on the values of the second plurality of sub-regions;
  
  applying the feature vector to a classifier;
  
  determining that the human hand is making an identified gesture based on output from the classifier; and
  
  causing an action to be taken by the device, based, at least in part, upon the identified gesture and the scene geometry.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The method of claim 7, further comprising monitoring and analyzing body or limb activity of the user.
  - 9. The method of claim 7, further comprising monitoring and analyzing body or limb activity of a second user.
  - 10. The method of claim 7, wherein determining that a user is engaging the device includes using of one of the following sensors:
    - an RGB sensor or a camera.
  - 11. The method of claim 7, wherein identifying a human hand comprises using connected component analysis to identify one or more connected groups of pixels within the depth image.
  - 12. The method of claim 7, wherein the three-dimensional region about the human hand comprises those pixels from the first plurality of pixels corresponding to a spatial location that is within a specified distance from the human hand.
  - 13. The method of claim 7, further comprising ranking at least some of the first plurality of pixels based on their values.

14. An electronic device comprising:
- a memory;
  
  a depth sensor;
  
  one or more processors, communicatively coupled to the memory, wherein the memory stores instructions to cause the one or more processors to;
  
  acquire a depth image of a scene in a vicinity of the electronic device using the depth sensor, the depth image having a first plurality of pixels, each pixel having a value indicative of a distance;
  
  store the depth image in the memory;
  
  develop a scene geometry based upon the depth image;
  
  determine that a user is engaging the electronic device;
  
  identify a human hand in a region of space based on the first plurality of values;
  
  identify a three-dimensional region about the human hand, wherein the three-dimensional region includes at least some of the first plurality of pixels;
  
  partition the three-dimensional region about the human hand into a second plurality of sub-regions, each sub-region having a corresponding value and size, wherein the value of a particular sub-region comprises a number of human hand pixels within the particular sub-region, wherein the sizes of the sub-regions are configured so that the number of human hand pixels within each sub-region is approximately equal, and wherein the sizes of the sub-regions are non-uniform;
  
  generate a feature vector for the human hand based on the values of the second plurality of sub-regions;
  
  apply the feature vector to a classifier;
  
  determine that the human hand is making an identified gesture based on output from the classifier; and
  
  cause an action to be taken by the electronic device, based, at least in part, upon the identified gesture and the scene geometry.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The electronic device of claim 14, wherein the instructions further cause the one or more processors to monitor and analyze body or limb activity of the user.
  - 16. The electronic device of claim 14, wherein the instructions further cause the one or more processors to monitor and analyze body or limb activity of a second user.
  - 17. The electronic device of claim 14, wherein the instructions to cause the one or more processors to determine that a user is engaging the electronic device include instructions to cause the one or more processors to use of one of the following types of depth sensors:
    - an RGB sensor or a camera.
  - 18. The electronic device of claim 14, wherein the instructions to cause the one or more processors to identify a human hand comprise instructions to cause the one or more processors to use connected component analysis to identify one or more connected groups of pixels within the depth image.
  - 19. The electronic device of claim 14, wherein the three-dimensional region about the human hand comprises those pixels from the first plurality of pixels corresponding to a spatial location that is within a specified distance from the human hand.
  - 20. The electronic device of claim 14, wherein the instructions further cause the one or more processors to:
    - rank at least some of the first plurality of pixels based on their values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Tang, Feng, Chen, Chong, Guo, Haitao, Shi, Xiaojin, Gernoth, Thorsten
Primary Examiner(s)
Bitar, Nancy

Application Number

US14/865,850
Publication Number

US 20170090584A1
Time in Patent Office

1,054 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 18/2411   based on the proximity to a...

G06F 18/24323   Tree-organised classifiers

G06F 3/011   Arrangements for interactio...

G06F 3/012   Head tracking input arrange...

G06F 3/017   Gesture based interaction, ...

G06F 3/0304   Detection arrangements usin...

G06F 3/16   Sound input; Sound output s...

G06F 3/165   Management of the audio str...

G06V 2201/12   Acquisition of 3D measureme...

G06V 40/107   Static hand or arm

G06V 40/113   Recognition of static hand ...

Multi media computing or entertainment system for responding to user presence and activity

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

38 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multi media computing or entertainment system for responding to user presence and activity

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links