Device selection for providing a response
First Claim
Patent Images
1. A system, comprising;
- a first speech processing pipeline instance that receives a first audio signal from a first speech interface device, the first audio signal representing a speech utterance, the first speech processing pipeline instance also receiving a first timestamp indicating a first time at which a wakeword was detected by the first speech interface device;
a second speech processing pipeline instance that receives a second audio signal from a second speech interface device, the second audio signal representing the speech utterance, the second speech processing pipeline also receiving a second timestamp indicating a second time at which the wakeword was detected by the second speech interface device;
the first speech processing pipeline instance having a series of processing components comprising;
an automatic speech recognition (ASR) component configured to analyze the first audio signal to determine words of the speech utterance;
a natural language understanding (NLU) component positioned in the first speech processing pipeline instance after the ASR component, the NLU component being configured to analyze the words of the speech utterance to determine an intent expressed by the speech utterance;
a response dispatcher positioned in the first speech processing pipeline instance after the NLU component, the response dispatcher being configured to specify a speech response to the speech utterance;
a first source arbiter positioned in the first speech processing pipeline instance before the ASR component, the first source arbiter being configured to determine (a) that an amount of time represented by a difference between the first timestamp and the second timestamp is less than a threshold;
(b) to determine that the first timestamp is greater than the second timestamp; and
(c) to abort the first speech processing pipeline instance.
1 Assignment
0 Petitions
Accused Products
Abstract
A system may use multiple speech interface devices to interact with a user by speech. All or a portion of the speech interface devices may detect a user utterance and may initiate speech processing to determine a meaning or intent of the utterance. Within the speech processing, arbitration is employed to select one of the multiple speech interface devices to respond to the user utterance. Arbitration may be based in part on metadata that directly or indirectly indicates the proximity of the user to the devices, and the device that is deemed to be nearest the user may be selected to respond to the user utterance.
-
Citations
19 Claims
-
1. A system, comprising;
-
a first speech processing pipeline instance that receives a first audio signal from a first speech interface device, the first audio signal representing a speech utterance, the first speech processing pipeline instance also receiving a first timestamp indicating a first time at which a wakeword was detected by the first speech interface device; a second speech processing pipeline instance that receives a second audio signal from a second speech interface device, the second audio signal representing the speech utterance, the second speech processing pipeline also receiving a second timestamp indicating a second time at which the wakeword was detected by the second speech interface device; the first speech processing pipeline instance having a series of processing components comprising; an automatic speech recognition (ASR) component configured to analyze the first audio signal to determine words of the speech utterance; a natural language understanding (NLU) component positioned in the first speech processing pipeline instance after the ASR component, the NLU component being configured to analyze the words of the speech utterance to determine an intent expressed by the speech utterance; a response dispatcher positioned in the first speech processing pipeline instance after the NLU component, the response dispatcher being configured to specify a speech response to the speech utterance; a first source arbiter positioned in the first speech processing pipeline instance before the ASR component, the first source arbiter being configured to determine (a) that an amount of time represented by a difference between the first timestamp and the second timestamp is less than a threshold;
(b) to determine that the first timestamp is greater than the second timestamp; and
(c) to abort the first speech processing pipeline instance. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, comprising:
-
receiving, by a first speech processing pipeline, a first digital audio signal produced by a first device after the first device detects a wakeword, the first speech processing pipeline including a first series of speech processing components that process the first digital audio signal; receiving, by a second speech processing pipeline, a second digital audio signal produced by a second device after the second device detects the wakeword, the second speech processing pipeline including a second series of speech processing components that process the second digital audio signal; receiving one or more first attributes associated with the first digital audio signal; receiving one or more second attributes associated with the second digital audio signal; determining that the first digital audio signal represents an utterance; determining that the second digital audio signal represents the utterance based at least in part on the first digital audio signal being received within a threshold amount of time of the second digital audio signal being received; determining, based at least in part on the one or more first attributes and the one or more second attributes, that the first speech processing pipeline will process the first digital audio signal; determining, based at least in part on the one or more first attributes and the one or more second attributes, that the first device will respond to the utterance; and sending, to the first device, audio data representing a speech response to the utterance. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system, comprising:
-
one or more processors; one or more non-transitory computer-readable media storing computer-executable instructions that, when executed on the one or more processors, cause the one or more processors to perform actions comprising; receiving, by a first speech processing pipeline, a first digital audio signal produced by a first device after the first device detects a wakeword, the first speech processing pipeline including a first series of speech processing components that process the first digital audio signal; receiving, by a second speech processing pipeline, a second digital audio signal produced by a second device after the second device detects the wakeword, the second speech processing pipeline includes a second series of speech processing components that process the second digital audio signal; receiving a first attribute associated with the first digital audio signal; receiving a second attributed associated with the second digital audio signal; determining that the first digital audio signal represents an utterance; determining that the second digital audio signal represents the utterance based at least in part on the first digital audio signal being received within a threshold amount of time of the second digital audio signal being received; determining, based at least in part on the one or more first attributes and the one or more second attributes, that the first speech processing pipeline will process the first digital audio signal; determining, based at least in part on the first attribute and the second attribute, that the first device will respond to the utterance; and sending, to the first device, audio data representing a speech response to the utterance. - View Dependent Claims (16, 17, 18, 19)
-
Specification