Low latency audio interface
First Claim
1. A method comprising:
- receiving, from an electronic device, audio input data representing a request;
performing speech recognition on the audio input data to obtain word data;
using natural language understanding (NLU) techniques on the word data to determine a topic associated with the request;
generating first audio output data including first words that are related to the topic;
determining that additional processing is needed to generate information responsive to the request;
generating second audio output data including second words that are related to the topic, the second words including transitional words between the first audio output data and the second audio output data, wherein the first audio output data and the second audio output data are generated at least partially in parallel;
prior to all of the audio input data being received from the electronic device, sending at least a portion of the first audio output data to the electronic device;
sending, at least partially in parallel to the first audio output data being sent, a communication to an interface associated with a skill to determine the information responsive to the request;
receiving, from the interface, the information responsive to the request;
generating third audio output data that includes the information responsive to the request and at least one additional transitional word between the second audio output data and the third audio output data,wherein the third audio output data and the second audio output data are generated at least partially in parallel;
prior to all of the audio input data being received from the electronic device, sending at least a portion of the second audio output data to the electronic device, wherein sending the at least a portion of the second audio output data occurs at least partially in parallel with generating the third audio output data; and
sending the third audio output data to the electronic device.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for utilizing incremental processing of portions of output data to limit the time required to provide a response to a user request are provided herein. In some embodiments, portions of the user request for information can be analyzed using techniques such as automatic speech recognition (ASR), speech-to-text (STT), and natural language understanding (NLU) to determine the overall topic of the user request. One the topic has been determined, portions of the anticipated audio output data can be synthesized independently instead of waiting for the complete response. The synthesized portions can then be provided to the electronic device in anticipation of being output through one or more speakers on the electronic device, which speeds up the time that the response can be provided to the user.
35 Citations
13 Claims
-
1. A method comprising:
-
receiving, from an electronic device, audio input data representing a request; performing speech recognition on the audio input data to obtain word data; using natural language understanding (NLU) techniques on the word data to determine a topic associated with the request; generating first audio output data including first words that are related to the topic; determining that additional processing is needed to generate information responsive to the request; generating second audio output data including second words that are related to the topic, the second words including transitional words between the first audio output data and the second audio output data, wherein the first audio output data and the second audio output data are generated at least partially in parallel; prior to all of the audio input data being received from the electronic device, sending at least a portion of the first audio output data to the electronic device; sending, at least partially in parallel to the first audio output data being sent, a communication to an interface associated with a skill to determine the information responsive to the request; receiving, from the interface, the information responsive to the request; generating third audio output data that includes the information responsive to the request and at least one additional transitional word between the second audio output data and the third audio output data, wherein the third audio output data and the second audio output data are generated at least partially in parallel; prior to all of the audio input data being received from the electronic device, sending at least a portion of the second audio output data to the electronic device, wherein sending the at least a portion of the second audio output data occurs at least partially in parallel with generating the third audio output data; and sending the third audio output data to the electronic device. - View Dependent Claims (2)
-
-
3. A method, comprising:
-
receiving, from an electronic device, audio input data representing a first series of words associated with a request; determining, using at least one natural language understanding (NLU) component, a topic to which the request relates; generating first audio output data representing at least a first word, the at least first word being associated with the topic; accessing an interface associated with a skill to determine information responsive to the request, the skill being associated with the topic, wherein the accessing and generating of the first audio output data are performed at least partially in parallel; prior to all of the audio input data being completely received, sending the first audio output data to the electronic device; generating second audio output data that includes at least a second word based at least in part on the received information responsive to the request, wherein the second audio output data is generated at least partially in parallel with sending the first audio output data to the electronic device; and sending the second audio output data to the electronic device. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
- communications circuitry that receives, from an electronic device, audio input data representing a first series of words associated with a request; and
at least one processor operable to;
use natural language understanding (NLU) techniques on word data to determine a topic associated with the request;generate a first audio output data representing at least a first word, the first word being associated with the topic; communicate with an interface associated with a skill to determine information responsive to the request, wherein the generation of the first audio output data is performed at least partially in parallel to the communication with the interface prior to all portions of the audio input data being received from the electronic device, initiate, the communications circuitry to send the first audio output data to the electronic device; generate, second audio output data, that includes at least a second word based at least in part on the received information responsive to the request, wherein the second audio output data is generated at least in partially in parallel with sending the first audio output data to the electronic device; and initiate, the communications circuitry to send the second audio output data to the electronic device. - View Dependent Claims (12, 13)
- communications circuitry that receives, from an electronic device, audio input data representing a first series of words associated with a request; and
Specification