COMPOUND GESTURE-SPEECH COMMANDS

US 20110313768A1
Filed: 06/18/2010
Published: 12/22/2011
Est. Priority Date: 06/18/2010
Status: Active Grant

First Claim

Patent Images

1. A method for controlling a computing system using a set of voice commands, comprising:

displaying one or more objects on a display monitor;

receiving body position data from a sensor;

recognizing a gesture in relation to the one or more objects based on the received body position data;

choosing a subset of the set of sound commands based on the recognized gesture, the set of sound commands includes multiple subsets, each subset is associated with one or more gestures and sound command recognition data for the respective subset;

loading sound command recognition data for the chosen subset of sound commands;

receiving sound input from a microphone;

recognizing a sound command from the sound input using the loaded sound command recognition data; and

performing an action in response to the recognized sound command.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user'"'"'s body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a gesture is recognized, a limited set of voice commands associated with the recognized gesture are loaded for use. Further, additional sets of voice commands may be structured in a hierarchical manner such that speaking a voice command from one set of voice commands leads to the system loading a next set of voice commands.

130 Citations

20 Claims

1. A method for controlling a computing system using a set of voice commands, comprising:
- displaying one or more objects on a display monitor;
  
  receiving body position data from a sensor;
  
  recognizing a gesture in relation to the one or more objects based on the received body position data;
  
  choosing a subset of the set of sound commands based on the recognized gesture, the set of sound commands includes multiple subsets, each subset is associated with one or more gestures and sound command recognition data for the respective subset;
  
  loading sound command recognition data for the chosen subset of sound commands;
  
  receiving sound input from a microphone;
  
  recognizing a sound command from the sound input using the loaded sound command recognition data; and
  
  performing an action in response to the recognized sound command.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein:
    - the sound command recognition data for the chosen subset of sound commands does not have data to recognize sound commands in the set of sound commands that are not in the chosen subset.
  - 3. The method of claim 1, further comprising:
    - displaying the chosen subset of sound commands.
  - 4. The method of claim 3, wherein the recognized gesture selects a displayed object, and wherein the chosen subset of sound commands is displayed proximate to the selected displayed object.
  - 5. The method of claim 1, further comprising:
    - after performing the action in response to the recognized sound command, loading additional sound command recognition data for a related subset of the chosen subset of sound commands;
      
      receiving a further sound command;
      
      recognizing the further sound command using the loaded additional sound command recognition data; and
      
      performing an action in response to the recognized further sound command.
  - 6. The method of claim 1, further comprising:
    - changing the state of the computing system based on the recognized gesture; and
      
      providing a hierarchical subset of sound commands related to the changed state of the computing system.
  - 7. The method of claim 6, wherein each of the recognized gestures corresponds to a different state of the computing system, and wherein sound command recognition data for each state of the computing system defines a hierarchical subset of sound commands, wherein each hierarchical subset of sound commands is only loaded when required by the recognized gesture.
  - 8. The method of claim 1, wherein the gesture and the sound command are received substantially simultaneously.
  - 9. The method of claim 1, wherein the gesture provides context for the sound command or the sound command provides context for the gesture.
  - 10. The method of claim 1, wherein the step of recognizing a sound command from the sound input further includes:
    - assigning a weighted confidence value to the step of recognizing a sound command; and
      
      increasing the weighted confidence value when the recognized gesture agrees with the sound command.

11. An interface system for controlling a multimedia system, comprising:
- a monitor for displaying multimedia content;
  
  a sensor for capturing user gestures;
  
  a microphone for capturing user sounds; and
  
  a computer connected to the sensor, the microphone and the monitor, the computer driving the monitor to display a group of objects, the computer receives image data representing a gesture from the sensor, the computer recognizes the gesture as selecting a first object from the group of objects, the computer updates the monitor to display a first contextual menu that shows a subset of sound commands that may be used with regard to the first object, the computer receives sound data representing a sound command from the microphone, the computer recognizes the sound command as being from the subset of sound commands, the sound command indicates a desired action with regard to the first object, the computer executes the desired action.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. A system as in claim 11, wherein:
    - the subset of sound commands is organized in one or more hierarchical levels, wherein each hierarchical level is only loaded into the computer as necessary to process a gesture.
  - 13. A system as in claim 12, wherein:
    - a subsequent hierarchical level of sound commands is loaded and displayed only after a sound command from a previous hierarchical level has been recognized and the desired action associated with the sound command taken.
  - 14. A system as in claim 11, wherein:
    - the gesture provides context, andthe sound commands are contextual.
  - 15. A system as in claim 11, wherein:
    - the gesture is used by the computer to facilitate recognition of the sound command, and the sound command is used by the computer to facilitate recognition of the gesture.
  - 16. A system as in claim 11, wherein:
    - the computer displays a progressive user interface on the monitor, wherein the user interface is updated after the computer executes the desired action to include a relevant subset of sound commands.

17. A processor readable storage device having instructions encoded thereon, the instructions for programming one or more processors to perform a method for controlling a multimedia system, comprising:
- displaying a group of one or more objects on a monitor;
  
  receiving body position data from a sensor;
  
  recognizing a gesture from the received body position data;
  
  updating the monitor display to list a set of sound commands available in response to the recognized gesture;
  
  receiving sound data from a microphone;
  
  recognizing a sound command from the set of sound commands based on the received sound data; and
  
  executing an action associated with the recognized sound command.
- View Dependent Claims (18, 19, 20)
- - 18. A processor readable storage device as in claim 17, wherein the gesture provides context for the sound commands thereby permitting a simpler and more limited set of contextual sound commands.
  - 19. A processor readable storage device as in claim 17, wherein the gesture enhances confidence that the sound command is properly recognized, and wherein the sound command enhances confidence that the gesture is properly recognized.
  - 20. A processor readable storage device as in claim 17, wherein a plurality of gestures are defined, and wherein each gesture is associated with a unique subset of sound commands.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Klein, Christian, Vassigh, Ali M., Flaks, Jason S., Soemo, Thomas M., Larco, Vanessa

Granted Patent

US 8,296,151 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/017   Gesture based interaction, ...

G06F 3/038   Control and interface arran...

G06F 3/167   Audio in a user interface, ...

G06T 7/521   from laser ranging, e.g. us...

G06V 40/107   Static hand or arm

G10L 2015/223   Execution procedure of a sp...

G10L 2015/226   using non-speech characteri...

COMPOUND GESTURE-SPEECH COMMANDS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

130 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

COMPOUND GESTURE-SPEECH COMMANDS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

130 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links