Accessory for a voice-controlled device
First Claim
Patent Images
1. A method comprising:
- receiving first audio data generated by a microphone of a device, the device residing in an environment that includes an accessory device, the first audio data representing speech of a user in the environment;
receiving an identifier associated with the device;
performing automatic speech recognition (ASR) on the first audio data to generate text representing the speech of the user;
analyzing the text to identify an intent associated with the text;
determining a uniform resource locator (URL) for acquiring primary content based at least in part on the intent associated with the text;
generating second audio data for output on a speaker of the device based at least in part on the intent associated with the text, the second audio data introducing the primary content;
sending the URL to the device;
sending the second audio data to the device for output on the speaker of the device;
determining, based at least in part on the identifier associated with the device, that the accessory device resides in the environment with the device;
identifying supplemental content to output on the accessory device, the supplemental content being associated with the primary content;
generating third audio data for output on the speaker of the device, the third audio data having a frequency of at least 20 kHz such that the third audio data is inaudible to the user, the third audio data encoding information identifying the supplemental content; and
sending the third audio data to the device for output on the speaker of the device.
1 Assignment
0 Petitions
Accused Products
Abstract
This disclosure describes techniques and systems for encoding instructions in audio data that, when output on a speaker of a first device in an environment, cause a second device to output content in the environment. In some instances, the audio data has a frequency that is inaudible to users in the environment. Thus, the first device is able to cause the second device to output the content without users in the environment hearing the instructions. In some instances, the first device also outputs content, and the content output by the second device is played at an offset relative to a position of the content output by the first device.
122 Citations
20 Claims
-
1. A method comprising:
-
receiving first audio data generated by a microphone of a device, the device residing in an environment that includes an accessory device, the first audio data representing speech of a user in the environment; receiving an identifier associated with the device; performing automatic speech recognition (ASR) on the first audio data to generate text representing the speech of the user; analyzing the text to identify an intent associated with the text; determining a uniform resource locator (URL) for acquiring primary content based at least in part on the intent associated with the text; generating second audio data for output on a speaker of the device based at least in part on the intent associated with the text, the second audio data introducing the primary content; sending the URL to the device; sending the second audio data to the device for output on the speaker of the device; determining, based at least in part on the identifier associated with the device, that the accessory device resides in the environment with the device; identifying supplemental content to output on the accessory device, the supplemental content being associated with the primary content; generating third audio data for output on the speaker of the device, the third audio data having a frequency of at least 20 kHz such that the third audio data is inaudible to the user, the third audio data encoding information identifying the supplemental content; and sending the third audio data to the device for output on the speaker of the device. - View Dependent Claims (2, 3, 20)
-
-
4. A method comprising:
-
receiving first audio data from a first device, the first device residing in an environment that includes the first device and a second device, the first audio data representing speech of a user in the environment; determining, based at least in part on the first audio data, to instruct the second device to output content in the environment; generating second audio data representing first audio and second audio, wherein; the first audio has a frequency below 20 kHz; and the second audio has a frequency of at least 20 kHz and represents one or more instructions for causing the second device to output the content in the environment; and sending the second audio data to the first device for output on one or more speakers of the first device. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A device comprising:
-
one or more microphones; one or more speakers; one or more processors; and one or more computer-readable media storing computer-executable instructions that, when executed, cause the one or more processors to perform acts comprising; generating first audio data based at least in part on speech of a user in an environment, the speech captured by the one or more microphones; sending the first audio data to one or more remote computing devices; receiving, from the one or more remote computing devices, second audio data for output by the one or more speakers, the second audio data including; first audio representing audio that is below 20 kHz, and second audio representing audio that is above 20 kHz, the second audio comprising one or more instructions for instructing an accessory device in the environment to acquire supplemental content; and outputting the second audio data by the one or more speakers, the second audio data including the first audio for the user to hear and the second audio for instructing the accessory device. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification