Natural assistant interaction
First Claim
1. An electronic device, comprising:
- one or more processors;
a microphone; and
memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for;
receiving, via the microphone, a first audio stream including one or more utterances;
determining whether the first audio stream includes a lexical trigger;
in accordance with a determination that the first audio stream includes the lexical trigger, generating one or more candidate text representations of the one or more utterances;
determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant;
in accordance with a determination that at least one candidate text representation is to be disregarded by the virtual assistant, generating one or more candidate intents based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation, wherein generating the one or more candidate intents comprises;
obtaining one or more pre-mitigation intents corresponding to the one or more candidate text representations of the one or more utterances, including obtaining a pre-mitigation intent corresponding to the to be disregarded at least one candidate text representation; and
selecting, from the one or more pre-mitigation intents, the one or more candidate intents corresponding to the one or more candidate text representations other than the to be disregarded at least one candidate text representation, wherein pre-mitigation intent corresponding to the to be disregarded at least one candidate text representation indicates that an utterance corresponding to the to be disregarded at least one candidate text representation is not directed to the virtual assistant;
determining whether the one or more candidate intents include at least one actionable intent;
in accordance with a determination that the one or more candidate intents include at least one actionable intent, executing the at least one actionable intent;
outputting a result of the execution of the at least one actionable intent.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and processes for operating a virtual assistant to provide natural assistant interaction are provided. In accordance with one or more examples, a method includes, at an electronic device with one or more processors and memory: receiving a first audio stream including one or more utterances; determining whether the first audio stream includes a lexical trigger; generating one or more candidate text representations of the one or more utterances; determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant. If at least one candidate text representation is to be disregarded, one or more candidate intents are generated based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation.
2568 Citations
49 Claims
-
1. An electronic device, comprising:
-
one or more processors; a microphone; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for; receiving, via the microphone, a first audio stream including one or more utterances; determining whether the first audio stream includes a lexical trigger; in accordance with a determination that the first audio stream includes the lexical trigger, generating one or more candidate text representations of the one or more utterances; determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant; in accordance with a determination that at least one candidate text representation is to be disregarded by the virtual assistant, generating one or more candidate intents based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation, wherein generating the one or more candidate intents comprises; obtaining one or more pre-mitigation intents corresponding to the one or more candidate text representations of the one or more utterances, including obtaining a pre-mitigation intent corresponding to the to be disregarded at least one candidate text representation; and selecting, from the one or more pre-mitigation intents, the one or more candidate intents corresponding to the one or more candidate text representations other than the to be disregarded at least one candidate text representation, wherein pre-mitigation intent corresponding to the to be disregarded at least one candidate text representation indicates that an utterance corresponding to the to be disregarded at least one candidate text representation is not directed to the virtual assistant; determining whether the one or more candidate intents include at least one actionable intent; in accordance with a determination that the one or more candidate intents include at least one actionable intent, executing the at least one actionable intent; outputting a result of the execution of the at least one actionable intent. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A method for providing natural language interaction by a virtual assistant, the method comprising:
at an electronic device with one or more processors, memory, and a microphone; receiving, via a microphone, a first audio stream including one or more utterances; determining whether the first audio stream includes a lexical trigger; in accordance with a determination that the first audio stream includes the lexical trigger, generating one or more candidate text representations of the one or more utterances; determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant; in accordance with a determination that at least one candidate text representation is to be disregarded by the virtual assistant, generating one or more candidate intents based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation, wherein generating the one or more candidate intents comprises; obtaining one or more pre-mitigation intents corresponding to the one or more candidate text representations of the one or more utterances, including obtaining a pre-mitigation intent corresponding to the to be disregarded at least one candidate text representation; and selecting, from the one or more pre-mitigation intents, the one or more candidate intents corresponding to the one or more candidate text representations other than the to be disregarded at least one candidate text representation, wherein pre-mitigation intent corresponding to the to be disregarded at least one candidate text representation indicates that an utterance corresponding to the to be disregarded at least one candidate text representation is not directed to the virtual assistant; determining whether the one or more candidate intents include at least one actionable intent; in accordance with a determination that the one or more candidate intents include at least one actionable intent, executing the at least one actionable intent; outputting a result of the execution of the at least one actionable intent. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
-
39. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs including instructions for:
-
receiving, via a microphone, a first audio stream including one or more utterances; determining whether the first audio stream includes a lexical trigger; in accordance with a determination that the first audio stream includes the lexical trigger, generating one or more candidate text representations of the one or more utterances; determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant; in accordance with a determination that at least one candidate text representation is to be disregarded by the virtual assistant, generating one or more candidate intents based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation, wherein generating the one or more candidate intents comprises; obtaining one or more pre-mitigation intents corresponding to the one or more candidate text representations of the one or more utterances, including obtaining a pre-mitigation intent corresponding to the to be disregarded at least one candidate text representation; and selecting, from the one or more pre-mitigation intents, the one or more candidate intents corresponding to the one or more candidate text representations other than the to be disregarded at least one candidate text representation, wherein pre-mitigation intent corresponding to the to be disregarded at least one candidate text representation indicates that an utterance corresponding to the to be disregarded at least one candidate text representation is not directed to the virtual assistant; determining whether the one or more candidate intents include at least one actionable intent; in accordance with a determination that the one or more candidate intents include at least one actionable intent, executing the at least one actionable intent; outputting a result of the execution of the at least one actionable intent. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
-
Specification