Full-duplex utterance processing in a natural language virtual assistant
First Claim
1. A computer-implemented method of generating a response to a spoken input, the method comprising:
- obtaining an audio input stream;
detecting in the audio input stream a beginning of a first utterance;
detecting in the audio input stream an end of the first utterance;
responsive to detecting the end of the first utterance, initiating processing of the first utterance to recognize a first query;
while processing the first utterance;
continuing to receive the audio input stream; and
detecting a beginning of a second utterance in the audio input stream,executing the first query to determine a first response;
detecting an end of the second utterance in the audio input stream;
processing the second utterance to recognize a second query;
identifying a serial dependency between the first query and the second query;
responsive to the identification of the serial dependency, delaying execution of the second query until the executing of the first query has completed;
executing the second query to determine a second response; and
outputting the second response;
wherein detecting ends of utterances comprises identifying at least one of;
a pause in speech in the audio input stream, non-speech in the audio input stream, or a user input event performed by a user.
10 Assignments
0 Petitions
Accused Products
Abstract
A query-processing system processes an input audio stream that represents a succession of queries spoken by a user. The query-processing system listens continuously to the input audio stream, parses queries and takes appropriate actions in mid-stream. In some embodiments, the system processes queries in parallel, limited by serial constraints. In some embodiments, the system parses and executes queries while a previous query'"'"'s execution is still in progress. To accommodate users who tend to speak slowly and express a thought in separate parts, the query-processing system halts the outputting of results corresponding to a previous query if it detects that a new speech utterance modifies the meaning of the previous query.
-
Citations
8 Claims
-
1. A computer-implemented method of generating a response to a spoken input, the method comprising:
-
obtaining an audio input stream; detecting in the audio input stream a beginning of a first utterance; detecting in the audio input stream an end of the first utterance; responsive to detecting the end of the first utterance, initiating processing of the first utterance to recognize a first query; while processing the first utterance; continuing to receive the audio input stream; and detecting a beginning of a second utterance in the audio input stream, executing the first query to determine a first response; detecting an end of the second utterance in the audio input stream; processing the second utterance to recognize a second query; identifying a serial dependency between the first query and the second query; responsive to the identification of the serial dependency, delaying execution of the second query until the executing of the first query has completed; executing the second query to determine a second response; and outputting the second response; wherein detecting ends of utterances comprises identifying at least one of;
a pause in speech in the audio input stream, non-speech in the audio input stream, or a user input event performed by a user. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer-readable storage medium storing instructions for generating a response to a spoken input, the instructions when executed by a computer processor performing actions comprising:
-
obtaining an audio input stream; detecting in the audio input stream a beginning of a first utterance; detecting in the audio input stream an end of the first utterance; responsive to detecting the end of the first utterance, initiating processing of the first utterance to recognize a first query; while processing the first utterance; continuing to receive the audio input stream; and detecting a beginning of a second utterance in the audio input stream, executing the first query to determine a first response; detecting an end of the second utterance in the audio input stream; processing the second utterance to recognize a second query; identifying a serial dependency between the first query and the second query; responsive to the identification of the serial dependency, delaying execution of the second query until the executing of the first query has completed; executing the second query to determine a second response; and outputting the second response; wherein detecting ends of utterances comprises identifying at least one of;
a pause in speech in the audio input stream, or non-speech in the audio input stream. - View Dependent Claims (6, 7, 8)
-
Specification