SATISFYING SPECIFIED INTENT(S) BASED ON MULTIMODAL REQUEST(S)
First Claim
1. A method comprising:
- determining that a camera of a processing system is pointed at one or more objects or a scene;
turning on speech understanding functionality of the processing system, using at least one processor of the processing system, in response to determining that the camera is pointed at the one or more objects or the scene, the speech understanding functionality enabling the processing system to understand natural language requests; and
automatically monitoring audio signals received via an audio interface of the processing system for speech requests from a user of the processing system to be processed using the speech understanding functionality in response to determining that the camera is pointed at the one or more objects or the scene.
4 Assignments
0 Petitions
Accused Products
Abstract
Techniques are described herein that are capable of satisfying specified intent(s) based on multimodal request(s). A multimodal request is a request that includes at least one request of a first type and at least one request of a second type that is different from the first type. Example types of request include but are not limited to a speech request, a text command, a tactile command, and a visual command. A determination is made that one or more entities in visual content are selected in accordance with an explicit scoping command from a user. In response, speech understanding functionality is automatically activated, and audio signals are automatically monitored for speech requests from the user to be processed using the speech understanding functionality.
-
Citations
20 Claims
-
1. A method comprising:
-
determining that a camera of a processing system is pointed at one or more objects or a scene; turning on speech understanding functionality of the processing system, using at least one processor of the processing system, in response to determining that the camera is pointed at the one or more objects or the scene, the speech understanding functionality enabling the processing system to understand natural language requests; and automatically monitoring audio signals received via an audio interface of the processing system for speech requests from a user of the processing system to be processed using the speech understanding functionality in response to determining that the camera is pointed at the one or more objects or the scene. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A processing system comprising:
-
a display configured to display visual content; a camera configured to capture visual information; determination logic configured to determine whether one or more entities in the visual content are selected in accordance with a visual command from a user, the visual command identifying the one or more entities in the visual content; speech understanding logic configured to understand natural language requests; activation logic configured to turn on speech understanding functionality of the speech understanding logic in response to a determination that the one or more entities are selected in accordance with the visual command; an audio interface configured to receive audio signals; and monitoring logic configured to monitor the visual information for visual commands from the user, the monitoring logic further configured to automatically monitor the audio signals for speech requests from the user in response to the determination that the one or more entities are selected, the monitoring logic further configured to provide the speech requests to the speech understanding logic for processing. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A processing system comprising:
-
determination logic that includes electrical circuitry and is configured to determine an intent of a user; intent logic configured to satisfy the intent; multimodal logic configured to present one or more representations that correspond to the intent of the user via an interface in response to satisfaction of the intent, each of the one or more representations including at least one carrier phrase and at least one slot; and speech understanding logic configured to receive a spoken response via a sensor in response to presentation of the one or more representations, the spoken response including one or more carrier phrases and further including one or more words in lieu of one or more slots, the spoken response indicating a task to be performed, the intent logic further configured to perform the task in response to receipt of the spoken response. - View Dependent Claims (19, 20)
-
Specification