COMPOUND GESTURE-SPEECH COMMANDS

US 20170228036A1
Filed: 04/28/2017
Published: 08/10/2017
Est. Priority Date: 06/18/2010
Status: Active Grant

First Claim

Patent Images

1-20. -20. (canceled)

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user'"'"'s body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a gesture is recognized, a limited set of voice commands associated with the recognized gesture are loaded for use. Further, additional sets of voice commands may be structured in a hierarchical manner such that speaking a voice command from one set of voice commands leads to the system loading a next set of voice commands.

Citations

40 Claims

1-20. -20. (canceled)

21. An automated method of initiating a machine action based on a combination of sounds and gestures made by one or more users, the method comprising:
- using a depth determining camera to capture a respective three-dimensional body pose made and/or a three-dimensional body action performed by a respective at least one of the one or more users;
  
  identifying a pre-specified three-dimensional gesture based on the camera captured three-dimensional body pose and/or three-dimensional body action of the respective at least one user;
  
  detecting one or more sounds made by the respective at least one or at least another of the one or more users, the detected one or more sounds being made in combination with the captured respective three-dimensional pose and/or three-dimensional action of the respective at least one user;
  
  using the identified three-dimensional gesture to automatically identify a command pre-associated with a compound combination of the identified pre-specified three-dimensional gesture and one or more pre-specified aspects of at least part of the detected one or more sounds; and
  
  in response to the automatically identified command, initiating performance by an instructable machine of a machine action that has been predetermined to be commanded by the automatically identified command.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 22. The method of claim 21 wherein the identifying of the three-dimensional gesture includes determining how a motion part of a camera captured body action performed by the respective at least one user relates to a size aspect of the body of the respective at least one user.
  - 23. The method of claim 22 wherein the size aspect of the body is a height of the respective at least one user.
  - 24. The method of claim 21 wherein the identifying of the three-dimensional gesture includes identifying a three-dimensional geometric shape traced by a motion part of a camera captured body action performed by the respective at least one user.
  - 25. The method of claim 24 wherein the traced three-dimensional geometric shape includes a closed two-dimensional geometric shape traced in three-dimensional space.
  - 26. The method of claim 25 wherein the traced three-dimensional geometric shape is made by a free hand of the respective at least one user.
  - 27. The method of claim 21 wherein the at least part of the one or more pre-specified aspects of the at least part of the detected sound requires the at least part to includes a non-speech fragment.
  - 28. The method of claim 27 wherein the non-speech fragment includes a sound produced by interaction of at least two body parts of at least one of the users.
  - 29. The method of claim 28 wherein the interaction produced sound is a clapping sound.
  - 30. The method of claim 21 wherein the one or more pre-specified aspects of at least part of the detected one or more sounds comprises:
    - a detectable association of the detected one or more sounds with one or more predetermined text string fragments that describe or define the respective sounds in keyword form.
  - 31. The method of claim 30 and further comprising:
    - concatenating together associated text string fragments of two or more of the detected one or more sounds.
  - 32. The method of claim 31 wherein:
    - the automatically identified command is a function of the concatenated together text string fragments.
  - 33. The method of claim 30 wherein:
    - the automatically identified command is a function of one or more of the text string fragments.
  - 34. The method of claim 21 wherein the three-dimensional body action includes a three-dimensional motion made by at least one free hand of a respective at least one of the users in a respective three-dimensional unencumbered space present about the respective at least one user.
  - 35. The method of claim 21 wherein the detected one or more sounds having the one or more pre-specified aspects begin before the respective three-dimensional body pose is made and/or the three-dimensional body action is performed by a respective at least one of the one or more users.
  - 36. The method of claim 35 wherein the beforehand begun one or more sounds having the one or more pre-specified aspects determine which of respective three-dimensional body poses made and/or three-dimensional body action performed by a respective at least one of the one or more users is/are to be recognized.
  - 37. The method of claim 21 wherein the pre-specified one or more aspects of the at least part of the detected one or more sounds and a corresponding pre-specified three-dimensional gesture are interrelated such that detection and identification of each confirms correct detection and identification of the other, the method further comprising:
    - using detection of the one or more pre-specified aspects of the at least part of the detected one or more sounds to confirm correct detection and identification of the corresponding pre-specified three-dimensional gesture and vice versa using identification of the corresponding pre-specified three-dimensional gesture to confirm correct detection and identification of the one or more pre-specified aspects of the at least part of the detected one or more sounds

38. A machine system comprising:
- a display configured to display virtual reality content;
  
  a depth sensor configured to capture depth information about real world objects;
  
  a sound sensor configured to capture sounds made by real world objects; and
  
  at least one processor in operative communication with the display, with the depth sensor, and with the sound sensor, the processor being configured to;
  
  use the depth sensor to capture depth aspects of respective three-dimensional body poses made and/or a three-dimensional body actions performed by a respective at least one of one or more users present in a field of view of the depth sensor;
  
  identify a pre-specified three-dimensional gesture based on the depth aspects captured by the depth sensor with respect to the three-dimensional body poses and/or three-dimensional body actions of the respective at least one user;
  
  use the sound sensor to detect one or more sounds made by the respective at least one or at least another of the one or more users present in a field of view of the depth sensor, the detected one or more sounds being made in combination with the respective three-dimensional poses and/or three-dimensional actions of the respective at least one user;
  
  use the identified three-dimensional gesture to automatically identify a command pre-associated with a compound combination of the identified pre-specified three-dimensional gesture and one or more pre-specified aspects of at least part of the detected one or more sounds; and
  
  in response to the automatically identified command, initiating performance by the at least one processor or an instructable other machine of a machine action that has been predetermined to be commanded by the automatically identified command.

39. One or more articles of manufacture which alone or in combination have machine-readable instructions embedded therein, the instructions being executable to provide an automated method of initiating a machine action based on a combination of sounds and gestures made by one or more users, the method comprising:
- using a depth determining camera to capture a respective three-dimensional body pose made and/or a three-dimensional body action performed by a respective at least one of the one or more users;
  
  identifying a pre-specified three-dimensional gesture based on the camera captured three-dimensional body pose and/or three-dimensional body action of the respective at least one user;
  
  detecting one or more sounds made by the respective at least one or at least another of the one or more users, the detected one or more sounds being made in combination with the captured respective three-dimensional pose and/or three-dimensional action of the respective at least one user;
  
  using the identified three-dimensional gesture to automatically identify a command pre-associated with a compound combination of the identified pre-specified three-dimensional gesture and one or more pre-specified aspects of at least part of the detected one or more sounds; and
  
  in response to the automatically identified command, initiating performance by an instructable machine of a machine action that has been predetermined to be commanded by the automatically identified command.
- View Dependent Claims (40)
- - 40. The one or more articles of manufacture of claim 39 wherein for the automated method, the detected one or more sounds having the one or more pre-specified aspects can begin before the respective three-dimensional body pose is made and/or the three-dimensional body action is performed by a respective at least one of the one or more users.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Larco, Vanessa, Vassigh, Ali M., Flaks, Jason S., Klein, Christian, Soemo, Thomas M.

Granted Patent

US 10,534,438 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/017   Gesture based interaction, ...

G06F 3/038   Control and interface arran...

G06F 3/167   Audio in a user interface, ...

G06T 7/521   from laser ranging, e.g. us...

G06V 40/107   Static hand or arm

G10L 2015/223   Execution procedure of a sp...

G10L 2015/226   using non-speech characteri...

COMPOUND GESTURE-SPEECH COMMANDS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

40 Claims

Specification

Solutions

Use Cases

Quick Links

COMPOUND GESTURE-SPEECH COMMANDS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

40 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links