Low-latency intelligent automated assistant
First Claim
1. An electronic device, comprising:
- one or more processors; and
memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for;
receiving a stream of audio, comprising;
receiving, from a first time to a second time, a first portion of the stream of audio containing at least a portion of a user utterance, wherein one or more candidate text representations are determined based on the at least a portion of the user utterance while receiving the first portion of the stream of audio; and
receiving, from the second time to a third time, a second portion of the stream of audio, wherein the electronic device stops receiving the stream of audio at the third time;
after determining the one or more candidate text representations, determining whether the first portion of the stream of audio satisfies a predetermined condition;
in response to determining that the first portion of the stream of audio satisfies the predetermined condition, performing, at least partially between the second time and the third time, operations comprising;
determining, based on the one or more candidate text representations of the at least a portion of the user utterance, a plurality of candidate user intents for the at least a portion of the user utterance, wherein each candidate user intent of the plurality of candidate user intents corresponds to a respective candidate task flow of a plurality of candidate task flows;
selecting a first candidate task flow of the plurality of candidate task flows; and
executing the first candidate task flow without providing an output to a user of the device;
determining whether a speech end-point condition is detected between the second time and the third time; and
in response to determining that a speech end-point condition is detected between the second time and the third time, presenting, to the user, results from executing the selected first candidate task flow.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and processes for operating a digital assistant are provided. In an example process, low-latency operation of a digital assistant is provided. In this example, natural language processing, task flow processing, dialogue flow processing, speech synthesis, or any combination thereof can be at least partially performed while awaiting detection of a speech end-point condition. Upon detection of a speech end-point condition, results obtained from performing the operations can be presented to the user. In another example, robust operation of a digital assistant is provided. In this example, task flow processing by the digital assistant can include selecting a candidate task flow from a plurality of candidate task flows based on determined task flow scores. The task flow scores can be based on speech recognition confidence scores, intent confidence scores, flow parameter scores, or any combination thereof. The selected candidate task flow is executed and corresponding results presented to the user.
2772 Citations
39 Claims
-
1. An electronic device, comprising:
-
one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for; receiving a stream of audio, comprising; receiving, from a first time to a second time, a first portion of the stream of audio containing at least a portion of a user utterance, wherein one or more candidate text representations are determined based on the at least a portion of the user utterance while receiving the first portion of the stream of audio; and receiving, from the second time to a third time, a second portion of the stream of audio, wherein the electronic device stops receiving the stream of audio at the third time; after determining the one or more candidate text representations, determining whether the first portion of the stream of audio satisfies a predetermined condition; in response to determining that the first portion of the stream of audio satisfies the predetermined condition, performing, at least partially between the second time and the third time, operations comprising; determining, based on the one or more candidate text representations of the at least a portion of the user utterance, a plurality of candidate user intents for the at least a portion of the user utterance, wherein each candidate user intent of the plurality of candidate user intents corresponds to a respective candidate task flow of a plurality of candidate task flows; selecting a first candidate task flow of the plurality of candidate task flows; and executing the first candidate task flow without providing an output to a user of the device; determining whether a speech end-point condition is detected between the second time and the third time; and in response to determining that a speech end-point condition is detected between the second time and the third time, presenting, to the user, results from executing the selected first candidate task flow. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for operating a digital assistant, the method comprising:
at an electronic device having one or more processors and memory; receiving a stream of audio, comprising; receiving, from a first time to a second time, a first portion of the stream of audio containing at least a portion of a user utterance, wherein one or more candidate text representations are determined based on the at least a portion of the user utterance while receiving the first portion of the stream of audio; and receiving, from the second time to a third time, a second portion of the stream of audio, wherein the electronic device stops receiving the stream of audio at the third time; after determining the one or more candidate text representations, determining whether the first portion of the stream of audio satisfies a predetermined condition; in response to determining that the first portion of the stream of audio satisfies the predetermined condition, performing, at least partially between the second time and the third time, operations comprising; determining, based on the one or more candidate text representations of the at least a portion of the user utterance, a plurality of candidate user intents for the at least a portion of the user utterance, wherein each candidate user intent of the plurality of candidate user intents corresponds to a respective candidate task flow of a plurality of candidate task flows ; selecting a first candidate task flow of the plurality of candidate task flows; and executing the first candidate task flow without providing an output to a user of the device; determining whether a speech end-point condition is detected between the second time and the third time; and in response to determining that a speech end-point condition is detected between the second time and the third time, presenting, to the user, results from executing the selected first candidate task flow. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30)
-
31. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs including instructions for:
-
receiving a stream of audio, comprising; receiving, from a first time to a second time, a first portion of the stream of audio containing at least a portion of a user utterance, wherein one or more candidate text representations are determined based on the at least a portion of the user utterance while receiving the first portion of the stream of audio; and receiving, from the second time to a third time, a second portion of the stream of audio, wherein the electronic device stops receiving the stream of audio at the third time; after determining the one or more candidate text representations, determining whether the first portion of the stream of audio satisfies a predetermined condition; in response to determining that the first portion of the stream of audio satisfies the predetermined condition, performing, at least partially between the second time and the third time, operations comprising; determining, based on the one or more candidate text representations of the at least a portion of the user utterance, a plurality of candidate user intents for the at least a portion of the user utterance, wherein each candidate user intent of the plurality of candidate user intents corresponds to a respective candidate task flow of a plurality of candidate task flows ; selecting a first candidate task flow of the plurality of candidate task flows; and executing the first candidate task flow without providing an output to a user of the device; determining whether a speech end-point condition is detected between the second time and the third time; and in response to determining that a speech end-point condition is detected between the second time and the third time, presenting, to the user, results from executing the selected first candidate task flow. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39)
-
Specification