Preventing of audio attacks using an input and an output hotword detection model
First Claim
1. A method comprising:
- receiving, at a processing module of a device, output audio data that is provided to a speaker of the device and that represents audio for output by the device;
receiving, by the processing module and after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device;
determining, by an output hotword detection model of the processing module, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, wherein the hotword is a word or phrase previously designated to precede a voice command;
determining, by an input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword; and
in response to determining, by the output hotword detection model, that the output audio data that is provided to the speaker of the device includes the representation of the hotword and, by the input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents input audio detected by the microphone of the device includes the representation of the hotword, blocking, by the processing module, use of the input audio data to initiate a command.
2 Assignments
0 Petitions
Accused Products
Abstract
In some implementations, a method includes receiving output audio data that is provided to a speaker of a device and that represents audio for output by the device, receiving, after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device, determining, by an output hotword detection model, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, determining, by an input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword, and, in response, blocking use of the input audio data to initiate a command.
36 Citations
16 Claims
-
1. A method comprising:
-
receiving, at a processing module of a device, output audio data that is provided to a speaker of the device and that represents audio for output by the device; receiving, by the processing module and after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device; determining, by an output hotword detection model of the processing module, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, wherein the hotword is a word or phrase previously designated to precede a voice command; determining, by an input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword; and in response to determining, by the output hotword detection model, that the output audio data that is provided to the speaker of the device includes the representation of the hotword and, by the input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents input audio detected by the microphone of the device includes the representation of the hotword, blocking, by the processing module, use of the input audio data to initiate a command. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A device comprising:
-
a processing module; and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the secure processing module to perform operations comprising; receiving, at the processing module of the device, output audio data that is provided to a speaker of the device and that represents audio for output by the device; receiving, by the processing module and after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device; determining, by the processing module, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, wherein the hotword is a word or phrase previously designated to precede a voice command; determining, by an input hotword detection model that is less accepting of hotwords than an output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword; and in response to determining, by the output hotword detection model, that the output audio data that is provided to the speaker of the device includes the representation of the hotword and, by the input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents input audio detected by the microphone of the device includes the representation of the hotword, blocking, by the processing module, use of the input audio data to initiate a command.
-
-
16. A computer-readable storage device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving, at a processing module of a device, output audio data that is provided to a speaker of the device and that represents audio for output by the device; receiving, by the processing module and after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device; determining, by an output hotword detection model of the processing module, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, wherein the hotword is a word or phrase previously designated to precede a voice command; determining, by an input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword; and in response to determining, by the output hotword detection model, that the output audio data that is provided to the speaker of the device includes the representation of the hotword and, by the input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents input audio detected by the microphone of the device includes the representation of the hotword, blocking, by the processing module, use of the input audio data to initiate a command.
-
Specification