Satisfying specified intent(s) based on multimodal request(s)
First Claim
1. A method comprising:
- determining that a camera of a processing system is pointed at one or more objects or a scene;
turning on speech understanding functionality of the processing system, using at least one processor of the processing system, in response to determining that the camera is pointed at the one or more objects or the scene, the speech understanding functionality enabling the processing system to understand natural language requests; and
automatically monitoring audio signals received via an audio interface of the processing system for speech requests from a user of the processing system to be processed using the speech understanding functionality in response to determining that the camera is pointed at the one or more objects or the scene.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques are described herein that are capable of satisfying specified intent(s) based on multimodal request(s). A multimodal request is a request that includes at least one request of a first type and at least one request of a second type that is different from the first type. Example types of request include but are not limited to a speech request, a text command, a tactile command, and a visual command. A determination is made that one or more entities in visual content are selected in accordance with an explicit scoping command from a user. In response, speech understanding functionality is automatically activated, and audio signals are automatically monitored for speech requests from the user to be processed using the speech understanding functionality.
27 Citations
20 Claims
-
1. A method comprising:
-
determining that a camera of a processing system is pointed at one or more objects or a scene; turning on speech understanding functionality of the processing system, using at least one processor of the processing system, in response to determining that the camera is pointed at the one or more objects or the scene, the speech understanding functionality enabling the processing system to understand natural language requests; and automatically monitoring audio signals received via an audio interface of the processing system for speech requests from a user of the processing system to be processed using the speech understanding functionality in response to determining that the camera is pointed at the one or more objects or the scene. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a display configured to display visual content; a camera configured to capture visual information; and at least one processor in electronic communication with the display, the camera, and a computer readable storage device storing instructions that when executed cause the processor to; determine whether one or more entities in the visual content are selected in accordance with a visual command from a user, the visual command identifying the one or more entities in the visual content; understand natural language requests; turn on speech understanding functionality in response to a determination that the one or more entities are selected in accordance with the visual command; receive audio signals; monitor the visual information for visual commands from the user; monitor the audio signals for speech requests from the user in response to the determination that the one or more entities are selected; and process the speech requests. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-readable storage device storing instructions that, when executed, perform a method, the method comprising:
-
determining that a camera of a processing system is pointed at one or more objects or a scene; turning on speech understanding functionality of the processing system, using at least one processor of the processing system, in response to determining that the camera is pointed at the one or more objects or the scene, the speech understanding functionality enabling the processing system to understand natural language requests; and automatically monitoring audio signals received via an audio interface of the processing system for speech requests from a user of the processing system to be processed using the speech understanding functionality in response to determining that the camera is pointed at the one or more objects or the scene. - View Dependent Claims (19, 20)
-
Specification