Multi-command single utterance input method
First Claim
1. An electronic device, comprising:
- one or more processors; and
memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for;
receiving speech input, wherein the speech input comprises a single utterance having two or more actionable commands;
generating a text string based on the speech input using a speech transcription process, wherein the speech transcription process is performed using one or more speech recognition models;
identifying a first keyword in the text string;
identifying a second keyword in the text string;
parsing the text string into at least a first candidate substring and a second candidate substring based at least in part on positions of the first keyword and the second keyword in the text string;
determine a first intent associated with the first candidate substring and a second intent associated with the second candidate substring, wherein the first intent corresponds to a first actionable command in the speech input and the second intent corresponds to a second actionable command in the speech input, wherein the first intent and the second intent are determined based on one or more nodes of an ontology; and
execute a first process identified by the first intent and a second process identified by the second intent.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user.
4802 Citations
45 Claims
-
1. An electronic device, comprising:
-
one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for; receiving speech input, wherein the speech input comprises a single utterance having two or more actionable commands; generating a text string based on the speech input using a speech transcription process, wherein the speech transcription process is performed using one or more speech recognition models; identifying a first keyword in the text string; identifying a second keyword in the text string; parsing the text string into at least a first candidate substring and a second candidate substring based at least in part on positions of the first keyword and the second keyword in the text string; determine a first intent associated with the first candidate substring and a second intent associated with the second candidate substring, wherein the first intent corresponds to a first actionable command in the speech input and the second intent corresponds to a second actionable command in the speech input, wherein the first intent and the second intent are determined based on one or more nodes of an ontology; and execute a first process identified by the first intent and a second process identified by the second intent. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method, comprising:
- at an electronic device;
receiving speech input, wherein the speech input comprises a single utterance having two or more actionable commands; generating a text string based on the speech input using a speech transcription process, wherein the speech transcription process is performed using one or more speech recognition models; identifying a first keyword in the text string; identifying a second keyword in the text string; parsing the text string into at least a first candidate substring and a second candidate substring based at least in part on positions of the first keyword and the second keyword in the text string; determine a first intent associated with the first candidate substring and a second intent associated with the second candidate substring, wherein the first intent corresponds to a first actionable command in the speech input and the second intent corresponds to a second actionable command in the speech input, wherein the first intent and the second intent are determined based on one or more nodes of an ontology; and execute a first process identified by the first intent and a second process identified by the second intent. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- at an electronic device;
-
17. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs including instructions for:
-
receiving speech input, wherein the speech input comprises a single utterance having two or more actionable commands; generating a text string based on the speech input using a speech transcription process, wherein the speech transcription process is performed using one or more speech recognition models; identifying a first keyword in the text string; identifying a second keyword in the text string; parsing the text string into at least a first candidate substring and a second candidate substring based at least in part on positions of the first keyword and the second keyword in the text string; determine a first intent associated with the first candidate substring and a second intent associated with the second candidate substring, wherein the first intent corresponds to a first actionable command in the speech input and the second intent corresponds to a second actionable command in the speech input, wherein the first intent and the second intent are determined based on one or more nodes of an ontology; and execute a first process identified by the first intent and a second process identified by the second intent. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
Specification