Collaborative voice controlled devices
First Claim
1. A computer-implemented method, comprising:
- identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword;
receiving, by the first computing device, audio data of an utterance spoken by a user;
determining, by the first computing device, that the utterance includes the particular, predefined hotword;
receiving, by the first computing device, a transcription of an initial response to the utterance provided by the second computing device, wherein the initial response comprises synthesized speech audibly outputted by the second computing device;
based on the transcription of the initial response and based on the utterance, generating, by the first computing device, a subsequent response to the initial response; and
providing, for output by the first computing device, audio data of the subsequent response or a transcription of the subsequent response.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for collaboration between multiple voice controlled devices are disclosed. In one aspect, a method includes the actions of identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; receiving audio data that corresponds to an utterance; receiving a transcription of additional audio data outputted by the second computing device in response to the utterance; based on the transcription of the additional audio data and based on the utterance, generating a transcription that corresponds to a response to the additional audio data; and providing, for output, the transcription that corresponds to the response.
-
Citations
19 Claims
-
1. A computer-implemented method, comprising:
-
identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; receiving, by the first computing device, audio data of an utterance spoken by a user; determining, by the first computing device, that the utterance includes the particular, predefined hotword; receiving, by the first computing device, a transcription of an initial response to the utterance provided by the second computing device, wherein the initial response comprises synthesized speech audibly outputted by the second computing device; based on the transcription of the initial response and based on the utterance, generating, by the first computing device, a subsequent response to the initial response; and providing, for output by the first computing device, audio data of the subsequent response or a transcription of the subsequent response. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; receiving, by the first computing device, audio data of an utterance spoken by a user; determining, by the first computing device, that the utterance includes the particular, predefined hotword; receiving, by the first computing device, a transcription of an initial response to the utterance provided by the second computing device, wherein the initial response comprises synthesized speech audibly outputted by the second computing device; based on the transcription of the initial response and based on the utterance, generating, by the first computing device, a subsequent response to the initial response; and providing, for output by the first computing device, audio data of the subsequent response or a transcription of the subsequent response.
-
19. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; receiving, by the first computing device, audio data of an utterance spoken by a user; determining, by the first computing device, that the utterance includes the particular, predefined hotword; receiving, by the first computing device, a transcription of an initial response to the utterance provided by the second computing device, wherein the initial response comprises synthesized speech audibly outputted by the second computing device; based on the transcription of the initial response and based on the utterance, generating, by the first computing device, a subsequent response to the initial response; and providing, for output by the first computing device, audio data of the subsequent response or a transcription of the subsequent response.
-
Specification