Reducing the need for manual start/end-pointing and trigger phrases
First Claim
1. A method for operating a virtual assistant on an electronic device, the method comprising:
- receiving, at the electronic device, an audio input;
monitoring the audio input to identify a first spoken user input, wherein the first spoken user input comprises a user request;
identifying the first spoken user input in the audio input;
determining whether to respond to the first spoken user input based on contextual information associated with the first spoken user input, wherein the contextual information comprises a direction of the user'"'"'s gaze when the first spoken user input was received, wherein the determining comprises;
calculating a likelihood score that the virtual assistant should provide an audible response to the first spoken user input based on the contextual information associated with the first spoken user input, wherein the audible response at least partially satisfies the user request;
increasing the likelihood score in response to the direction of the user'"'"'s gaze being pointed at the electronic device when the first spoken user input was received; and
decreasing the likelihood score in response to the direction of the user'"'"'s gaze being pointed away from the electronic device when the first spoken user input was received;
in response to a determination to respond to the first spoken user input;
generating the audible response to the first spoken user input; and
monitoring the audio input to identify a second spoken user input; and
in response to a determination not to respond to the first spoken user input, monitoring the audio input to identify the second spoken user input without generating the audible response to the first spoken user input.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and processes for selectively processing and responding to a spoken user input are provided. In one example, audio input containing a spoken user input can be received at a user device. The spoken user input can be identified from the audio input by identifying start and end-points of the spoken user input. It can be determined whether or not the spoken user input was intended for a virtual assistant based on contextual information. The determination can be made using a rule-based system or a probabilistic system. If it is determined that the spoken user input was intended for the virtual assistant, the spoken user input can be processed and an appropriate response can be generated. If it is instead determined that the spoken user input was not intended for the virtual assistant, the spoken user input can be ignored and/or no response can be generated.
-
Citations
73 Claims
-
1. A method for operating a virtual assistant on an electronic device, the method comprising:
-
receiving, at the electronic device, an audio input; monitoring the audio input to identify a first spoken user input, wherein the first spoken user input comprises a user request; identifying the first spoken user input in the audio input; determining whether to respond to the first spoken user input based on contextual information associated with the first spoken user input, wherein the contextual information comprises a direction of the user'"'"'s gaze when the first spoken user input was received, wherein the determining comprises; calculating a likelihood score that the virtual assistant should provide an audible response to the first spoken user input based on the contextual information associated with the first spoken user input, wherein the audible response at least partially satisfies the user request; increasing the likelihood score in response to the direction of the user'"'"'s gaze being pointed at the electronic device when the first spoken user input was received; and decreasing the likelihood score in response to the direction of the user'"'"'s gaze being pointed away from the electronic device when the first spoken user input was received; in response to a determination to respond to the first spoken user input; generating the audible response to the first spoken user input; and monitoring the audio input to identify a second spoken user input; and in response to a determination not to respond to the first spoken user input, monitoring the audio input to identify the second spoken user input without generating the audible response to the first spoken user input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the device to:
-
receive an audio input; monitor the audio input to identify a first spoken user input, wherein the first spoken user input comprises a user request; identify the first spoken user input in the audio input; determine whether to respond to the first spoken user input based on contextual information associated with the first spoken user input, wherein the contextual information comprises a direction of the user'"'"'s gaze when the first spoken user input was received, wherein the determining comprises; calculating a likelihood score that the virtual assistant should provide an audible response to the first spoken user input based on the contextual information associated with the first spoken user input, wherein the audible response at least partially satisfies the user request; increasing the likelihood score in response to the direction of the user'"'"'s gaze being pointed at the electronic device when the first spoken user input was received; and decreasing the likelihood score in response to the direction of the user'"'"'s gaze being pointed away from the electronic device when the first spoken user input was received; responsive to a determination to respond to the first spoken user input; generate the audible response to the first spoken user input; and monitor the audio input to identify a second spoken user input; and responsive to a determination not to respond to the first spoken user input, monitor the audio input to identify the second spoken user input without generating the audible response to the first spoken user input. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 73)
-
-
37. A system comprising:
-
one or more processors; memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving an audio input; monitoring the audio input to identify a first spoken user input, wherein the first spoken user input comprises a user request; identifying the first spoken user input in the audio input; determining whether to respond to the first spoken user input based on contextual information associated with the first spoken user input, wherein the contextual information comprises a direction of the user'"'"'s gaze when the first spoken user input was received, wherein the determining comprises; calculating a likelihood score that the virtual assistant should provide an audible response to the first spoken user input based on the contextual information associated with the first spoken user input, wherein the audible response at least partially satisfies the user request; increasing the likelihood score in response to the direction of the user'"'"'s gaze being pointed at the electronic device when the first spoken user input was received; and decreasing the likelihood score in response to the direction of the user'"'"'s gaze being pointed away from the electronic device when the first spoken user input was received; responsive to a determination to respond to the first spoken user input; generating the audible response to the first spoken user input; and monitoring the audio input to identify a second spoken user input; and responsive to a determination not to respond to the first spoken user input, monitoring the audio input to identify the second spoken user input without generating the audible response to the first spoken user input. - View Dependent Claims (56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
-
Specification