COMPOUND GESTURE-SPEECH COMMANDS
2 Assignments
0 Petitions
Accused Products
Abstract
A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user'"'"'s body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a gesture is recognized, a limited set of voice commands associated with the recognized gesture are loaded for use. Further, additional sets of voice commands may be structured in a hierarchical manner such that speaking a voice command from one set of voice commands leads to the system loading a next set of voice commands.
-
Citations
40 Claims
-
1-20. -20. (canceled)
-
21. An automated method of initiating a machine action based on a combination of sounds and gestures made by one or more users, the method comprising:
-
using a depth determining camera to capture a respective three-dimensional body pose made and/or a three-dimensional body action performed by a respective at least one of the one or more users; identifying a pre-specified three-dimensional gesture based on the camera captured three-dimensional body pose and/or three-dimensional body action of the respective at least one user; detecting one or more sounds made by the respective at least one or at least another of the one or more users, the detected one or more sounds being made in combination with the captured respective three-dimensional pose and/or three-dimensional action of the respective at least one user; using the identified three-dimensional gesture to automatically identify a command pre-associated with a compound combination of the identified pre-specified three-dimensional gesture and one or more pre-specified aspects of at least part of the detected one or more sounds; and in response to the automatically identified command, initiating performance by an instructable machine of a machine action that has been predetermined to be commanded by the automatically identified command. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A machine system comprising:
-
a display configured to display virtual reality content; a depth sensor configured to capture depth information about real world objects; a sound sensor configured to capture sounds made by real world objects; and at least one processor in operative communication with the display, with the depth sensor, and with the sound sensor, the processor being configured to; use the depth sensor to capture depth aspects of respective three-dimensional body poses made and/or a three-dimensional body actions performed by a respective at least one of one or more users present in a field of view of the depth sensor; identify a pre-specified three-dimensional gesture based on the depth aspects captured by the depth sensor with respect to the three-dimensional body poses and/or three-dimensional body actions of the respective at least one user; use the sound sensor to detect one or more sounds made by the respective at least one or at least another of the one or more users present in a field of view of the depth sensor, the detected one or more sounds being made in combination with the respective three-dimensional poses and/or three-dimensional actions of the respective at least one user; use the identified three-dimensional gesture to automatically identify a command pre-associated with a compound combination of the identified pre-specified three-dimensional gesture and one or more pre-specified aspects of at least part of the detected one or more sounds; and in response to the automatically identified command, initiating performance by the at least one processor or an instructable other machine of a machine action that has been predetermined to be commanded by the automatically identified command.
-
-
39. One or more articles of manufacture which alone or in combination have machine-readable instructions embedded therein, the instructions being executable to provide an automated method of initiating a machine action based on a combination of sounds and gestures made by one or more users, the method comprising:
-
using a depth determining camera to capture a respective three-dimensional body pose made and/or a three-dimensional body action performed by a respective at least one of the one or more users; identifying a pre-specified three-dimensional gesture based on the camera captured three-dimensional body pose and/or three-dimensional body action of the respective at least one user; detecting one or more sounds made by the respective at least one or at least another of the one or more users, the detected one or more sounds being made in combination with the captured respective three-dimensional pose and/or three-dimensional action of the respective at least one user; using the identified three-dimensional gesture to automatically identify a command pre-associated with a compound combination of the identified pre-specified three-dimensional gesture and one or more pre-specified aspects of at least part of the detected one or more sounds; and in response to the automatically identified command, initiating performance by an instructable machine of a machine action that has been predetermined to be commanded by the automatically identified command. - View Dependent Claims (40)
-
Specification