Connected accessory for a voice-controlled device
First Claim
1. A method comprising:
- receiving, from a voice-controlled device in an environment that includes the voice-controlled device and an accessory device, an indication that the voice-controlled device has established a wireless connection with the accessory device;
storing the indication in a user profile associated with the voice-controlled device;
receiving, from the voice-controlled device, audio data generated based at least in part on sound captured by a microphone of the voice-controlled device;
receiving a device identifier from the voice-controlled device;
accessing the user profile based at least in part on the device identifier;
determining, based at least in part on the indication in the user profile, that the accessory device is present in the environment;
determining to identify multiple domains of a natural language understanding (NLU) system based at least in part on determining that the accessory device is present in the environment;
generating, by performing automatic speech recognition (ASR) on the audio data, text data corresponding to the audio data;
sending the text data to the multiple domains of the NLU system;
identifying a first intent associated with a first domain of the multiple domains;
identifying a second intent associated with a second domain of the multiple domains;
identifying a named entity within the text data;
sending, to the voice-controlled device;
first information about a first storage location where audio content associated with the named entity is stored, anda first instruction corresponding to the first intent;
sending, to the accessory device in the environment;
second information about a second storage location where control information associated with the audio content is stored, the control information comprising at least viseme information, the viseme information comprising a series of timestamped mouth movement instructions, anda second instruction corresponding to the second intent;
at a first time based at least in part on the first instruction, initiating output of the audio content via a speaker of the voice-controlled device; and
at a second time based at least in part on the second instruction, operating a movable mouth of the accessory device or presenting mouth-related animations on a display of the accessory device according to the viseme information.
1 Assignment
0 Petitions
Accused Products
Abstract
Coordinated operation of a voice-controlled device and an accessory device in an environment is described. A remote system processes audio data it receives from the voice-controlled device in the environment to identify a first intent associated with a first domain, a second intent associated with a second domain, and a named entity associated with the audio data. The remote system sends, to the voice-controlled device, first information for accessing main content associated with the named entity, and a first instruction corresponding to the first intent. The remote system also sends, to the accessory device, second information for accessing control information or supplemental content associated with the main content, and a second instruction corresponding to the second intent. The first and second instructions, when processed by the devices in the environment, cause coordinated operation of the voice-controlled device and the accessory device.
182 Citations
19 Claims
-
1. A method comprising:
-
receiving, from a voice-controlled device in an environment that includes the voice-controlled device and an accessory device, an indication that the voice-controlled device has established a wireless connection with the accessory device; storing the indication in a user profile associated with the voice-controlled device; receiving, from the voice-controlled device, audio data generated based at least in part on sound captured by a microphone of the voice-controlled device; receiving a device identifier from the voice-controlled device; accessing the user profile based at least in part on the device identifier; determining, based at least in part on the indication in the user profile, that the accessory device is present in the environment; determining to identify multiple domains of a natural language understanding (NLU) system based at least in part on determining that the accessory device is present in the environment; generating, by performing automatic speech recognition (ASR) on the audio data, text data corresponding to the audio data; sending the text data to the multiple domains of the NLU system; identifying a first intent associated with a first domain of the multiple domains; identifying a second intent associated with a second domain of the multiple domains; identifying a named entity within the text data; sending, to the voice-controlled device; first information about a first storage location where audio content associated with the named entity is stored, and a first instruction corresponding to the first intent; sending, to the accessory device in the environment; second information about a second storage location where control information associated with the audio content is stored, the control information comprising at least viseme information, the viseme information comprising a series of timestamped mouth movement instructions, and a second instruction corresponding to the second intent; at a first time based at least in part on the first instruction, initiating output of the audio content via a speaker of the voice-controlled device; and at a second time based at least in part on the second instruction, operating a movable mouth of the accessory device or presenting mouth-related animations on a display of the accessory device according to the viseme information. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
receiving, from a voice-controlled device in an environment, audio data generated based at least in part on sound captured by the voice-controlled device; generating, by performing automatic speech recognition (ASR) on the audio data, text data corresponding to the audio data; identifying, based at least in part on the text data, a first intent associated with a first domain of multiple domains of a natural language understanding (NLU) system; identifying, based at least in part on the text data, a second intent associated with a second domain of the multiple domains, wherein the second intent causes the second domain to generate viseme information configured to cause a lip synch response by a second device; identifying a named entity within the text data; sending, to the voice-controlled device, a first instruction corresponding to the first intent, wherein the first instruction causes the voice-controlled device to output audio content at a first time on a speaker of the voice-controlled device; and sending, to the second device in the environment, a second instruction corresponding to the second intent, wherein the second instruction causes the second device to process the viseme information. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system comprising:
-
at least one processor; and memory storing computer-executable instructions that, when executed by the at least one processor, cause the at least one processor to; receive, from a voice-controlled device in an environment, audio data generated based at least in part on sound captured by the voice-controlled device; generate, by performing automatic speech recognition (ASR) on the audio data, text data corresponding to the audio data; identify, based at least in part on the text data, a first intent associated with a first domain of multiple domains of a natural language understanding (NLU) system; identify, based at least in part on the text data, a second intent associated with a second domain of the multiple domains, wherein the second intent causes the second domain to generate viseme information configured to cause a lip synch response by a second device; identify a named entity within the text data; send, to the voice-controlled device, a first instruction corresponding to the first intent, wherein the first instruction causes the voice-controlled device to output audio content at a first time on a speaker of the voice-controlled device; and send, to the second device in the environment, a second instruction corresponding to the second intent, wherein the second instruction causes the second device to process the viseme information control. - View Dependent Claims (17, 18, 19)
-
Specification