Preventing of audio attacks using an input and an output hotword detection model

US 10,242,673 B2
Filed: 12/07/2016
Issued: 03/26/2019
Est. Priority Date: 12/07/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, at a processing module of a device, output audio data that is provided to a speaker of the device and that represents audio for output by the device;

receiving, by the processing module and after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device;

determining, by an output hotword detection model of the processing module, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, wherein the hotword is a word or phrase previously designated to precede a voice command;

determining, by an input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword; and

in response to determining, by the output hotword detection model, that the output audio data that is provided to the speaker of the device includes the representation of the hotword and, by the input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents input audio detected by the microphone of the device includes the representation of the hotword, blocking, by the processing module, use of the input audio data to initiate a command.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some implementations, a method includes receiving output audio data that is provided to a speaker of a device and that represents audio for output by the device, receiving, after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device, determining, by an output hotword detection model, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, determining, by an input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword, and, in response, blocking use of the input audio data to initiate a command.

36 Citations

View as Search Results

16 Claims

1. A method comprising:
- receiving, at a processing module of a device, output audio data that is provided to a speaker of the device and that represents audio for output by the device;
  
  receiving, by the processing module and after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device;
  
  determining, by an output hotword detection model of the processing module, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, wherein the hotword is a word or phrase previously designated to precede a voice command;
  
  determining, by an input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword; and
  
  in response to determining, by the output hotword detection model, that the output audio data that is provided to the speaker of the device includes the representation of the hotword and, by the input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents input audio detected by the microphone of the device includes the representation of the hotword, blocking, by the processing module, use of the input audio data to initiate a command.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the determining that the output audio includes a representation of a hotword comprises:
    - generating by the output hotword detection model a hotword score for the output audio data,comparing, by the output hotword detection model, the hotword score to a predetermined threshold; and
      
      determining, by the output hotword detection model and based on the comparing, that the output audio includes a representation of a hotword.
  - 3. The method of claim 2, further comprising:
    - generating, by the input hotword detection model, a separate hotword score for the output audio data;
      
      comparing by the input hotword detection model, the separate hotword score to a separate predetermined threshold;
      
      confirming by the input hotword detection model and based on the comparing, that the output audio data includes a representation of a hotword; and
      
      based on the confirming that the output audio data includes the presentation of the hotword, blocking, by the processing module, use of the input audio data to initiate a command.
  - 4. The method of claim 3, wherein the predetermined threshold is different from the separate predetermined threshold.
  - 5. The method of claim 3, wherein the output hotword detection model is a trained neural network, and wherein the input hotword detection model is a trained neural network.
  - 6. The method of claim 5, wherein the predetermined threshold is determined by the output hotword detection model during training, and wherein the separate predetermined threshold is determined by the output hotword detection model during training.
  - 7. The method of claim 3, wherein the input hotword detection model generates the separate hotword score after the determining that the output audio data includes the representation of the hotword.
  - 8. The method of claim 2, wherein blocking, by the processing module, use of the input audio data to initiate a command comprises blocking the command from being executed.
  - 9. The method of claim 2, further comprising outputting, by the processing module, data indicating that the device has been compromised.
  - 10. The method of claim 1, wherein the hotword is a predetermined word that has been designated to signal the beginning of a voice query or voice command that immediately follows the hotword.
  - 11. The method of claim 1, wherein the output hotword detection model and the input hotword detection model operate in parallel.
  - 12. The method of claim 1, wherein blocking, by the processing module, use of the input audio data to initiate a command comprises blocking use of the input audio data to initiate the command by preventing the device from transmitting the input audio data as a command to a remote server.
  - 13. The method of claim 1, wherein receiving, at a processing module of a device, output audio data that is provided to a speaker of the device and that represents audio for output by the device comprises:
    - receiving, at the processing module of the device, the output audio data before the audio is audibly output by the speaker.
  - 14. The method of claim 1, wherein determining, by the output hotword detection model of the processing model, that the output audio data that is provided to the speaker of the device includes the representation of the hotword occurs before the output audio data is audibly output by the speaker of the device.

15. A device comprising:
- a processing module; and
  
  one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the secure processing module to perform operations comprising;
  
  receiving, at the processing module of the device, output audio data that is provided to a speaker of the device and that represents audio for output by the device;
  
  receiving, by the processing module and after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device;
  
  determining, by the processing module, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, wherein the hotword is a word or phrase previously designated to precede a voice command;
  
  determining, by an input hotword detection model that is less accepting of hotwords than an output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword; and
  
  in response to determining, by the output hotword detection model, that the output audio data that is provided to the speaker of the device includes the representation of the hotword and, by the input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents input audio detected by the microphone of the device includes the representation of the hotword, blocking, by the processing module, use of the input audio data to initiate a command.

16. A computer-readable storage device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, at a processing module of a device, output audio data that is provided to a speaker of the device and that represents audio for output by the device;
  
  receiving, by the processing module and after the output audio data is provided to the speaker of the device, input audio data that represents audio detected by a microphone of the device;
  
  determining, by an output hotword detection model of the processing module, that the output audio data that is provided to the speaker of the device includes a representation of a hotword, wherein the hotword is a word or phrase previously designated to precede a voice command;
  
  determining, by an input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents audio detected by a microphone of the device includes a representation of a hotword; and
  
  in response to determining, by the output hotword detection model, that the output audio data that is provided to the speaker of the device includes the representation of the hotword and, by the input hotword detection model that is less accepting of hotwords than the output hotword detection model, that the input audio data that represents input audio detected by the microphone of the device includes the representation of the hotword, blocking, by the processing module, use of the input audio data to initiate a command.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Campbell, Lee, Beder, Samuel Kramer
Primary Examiner(s)
Kazeminezhad, Farzad

Application Number

US15/371,907
Publication Number

US 20180158453A1
Time in Patent Office

839 Days
Field of Search

704235, 704233, 4554141
US Class Current
CPC Class Codes

G06F 21/6218   to a system of files or obj...

G10L 15/16   using artificial neural net...

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/222   Barge in, i.e. overridable ...

G10L 15/30   Distributed recognition, e....

G10L 17/24   the user being prompted to ...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

Preventing of audio attacks using an input and an output hotword detection model

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

36 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Preventing of audio attacks using an input and an output hotword detection model

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links