Hotword suppression

US 10,692,496 B2
Filed: 05/21/2019
Issued: 06/23/2020
Est. Priority Date: 05/22/2018
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by a computing device, audio data corresponding to playback of an utterance;

providing, by the computing device, the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample;

receiving, by the computing device and from the model (i) that is configured to determine whether the given audio data sample includes the audio watermark and (ii) that was trained using the watermarked audio data samples that include the audio watermark and the non-watermarked audio data samples that do not include the audio watermark, data indicating whether the audio data includes the audio watermark; and

based on the data indicating whether the audio data includes the audio watermark, determining, by the computing device, to continue or cease processing of the audio data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for suppressing hotwords are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to playback of an utterance. The actions further include providing the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample. The actions further include receiving, from the model, data indicating whether the audio data includes the audio watermark. The actions further include, based on the data indicating whether the audio data includes the audio watermark, determining to continue or cease processing of the audio data.

108 Citations

20 Claims

1. A computer-implemented method comprising:
- receiving, by a computing device, audio data corresponding to playback of an utterance;
  
  providing, by the computing device, the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample;
  
  receiving, by the computing device and from the model (i) that is configured to determine whether the given audio data sample includes the audio watermark and (ii) that was trained using the watermarked audio data samples that include the audio watermark and the non-watermarked audio data samples that do not include the audio watermark, data indicating whether the audio data includes the audio watermark; and
  
  based on the data indicating whether the audio data includes the audio watermark, determining, by the computing device, to continue or cease processing of the audio data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein:
    - receiving the data indicating whether the audio data includes the audio watermark comprises receiving the data indicating that the audio data includes the audio watermark,determining to continue or cease processing of the audio data comprises determining to cease processing of the audio data based on receiving the data indicating that the audio data includes the audio watermark, andthe method further comprises, based on determining to cease processing of the audio data, ceasing, by the computing device, processing of the audio data.
  - 3. The method of claim 1, wherein:
    - receiving the data indicating whether the audio data includes the audio watermark comprises receiving the data indicating that the audio data does not include the audio watermark,determining to continue or cease processing of the audio data comprises determining to continue processing of the audio data based on receiving the data indicating that the audio data does not include the audio watermark, andthe method further comprises, based on determining to continue processing of the audio data, continuing, by the computing device, processing of the audio data.
  - 4. The method of claim 1, wherein the processing of the audio data comprises:
    - generating a transcription of the utterance by performing speech recognition on the audio data.
  - 5. The method of claim 1, wherein the processing of the audio data comprises:
    - determining whether the audio data includes an utterance of a particular, predefined hotword.
  - 6. The method of claim 1, comprising:
    - before providing the audio data as an input to the model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample, determining, by the computing device, that the audio data includes an utterance of a particular, predefined hotword.
  - 7. The method of claim 1, comprising:
    - determining, by the computing device, that the audio data includes an utterance of a particular, predefined hotword,wherein providing the audio data as an input to the model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample is in response to determining that the audio data includes an utterance of a particular, predefined hotword.
  - 8. The method of claim 1, comprising:
    - receiving, by the computing device, the watermarked audio data samples that each include an audio watermark, the non-watermarked audio data samples that do not each include an audio watermark, and data indicating whether each watermarked and non-watermarked audio sample includes an audio watermark; and
      
      training, by the computing device and using machine learning, the model using the watermarked audio data samples that each include an audio watermark, the non-watermarked audio data samples that do not each include the audio watermark, and the data indicating whether each watermarked and non-watermarked audio sample includes an audio watermark.
  - 9. The method of claim 8, wherein at least a portion of the watermarked audio data samples each include an audio watermark at multiple, periodic locations.
  - 10. The method of claim 8, wherein audio watermarks in one of the watermarked audio data samples are different to audio watermark in another of the watermarked audio data samples.
  - 11. The method of claim 1, comprising:
    - determining, by the computing device, a first time of receipt of the audio data corresponding to playback of an utterance;
      
      receiving, by the computing device, a second time that an additional computing device provided, for output, the audio data corresponding to playback of an utterance and data indicating whether the audio data included a watermark;
      
      determining, by the computing device, that the first time matches the second time; and
      
      based on determining that the first time matches the second time, updating, by the computing device, the model using the data indicating whether the audio data included a watermark.

12. A system comprising:
- one or more computers; and
  
  one or more non-transitory storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by a computing device, audio data corresponding to playback of an utterance;
  
  providing, by the computing device, the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample;
  
  receiving, by the computing device and from the model (i) that is configured to determine whether the given audio data sample includes the audio watermark and (ii) that was trained using the watermarked audio data samples that include the audio watermark and the non-watermarked audio data samples that do not include the audio watermark, data indicating whether the audio data includes the audio watermark; and
  
  based on the data indicating whether the audio data includes the audio watermark, determining, by the computing device, to continue or cease processing of the audio data.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The system of claim 12, wherein:
    - receiving the data indicating whether the audio data includes the audio watermark comprises receiving the data indicating that the audio data includes the audio watermark,determining to continue or cease processing of the audio data comprises determining to cease processing of the audio data based on receiving the data indicating that the audio data includes the audio watermark, andthe method further comprises, based on determining to cease processing of the audio data, ceasing, by the computing device, processing of the audio data.
  - 14. The system of claim 12, wherein the processing of the audio data comprises:
    - generating a transcription of the utterance by performing speech recognition on the audio data.
  - 15. The system of claim 12, wherein the processing of the audio data comprises:
    - determining whether the audio data includes an utterance of a particular, predefined hotword.
  - 16. The system of claim 12, wherein the operations comprise:
    - before providing the audio data as an input to the model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample, determining, by the computing device, that the audio data includes an utterance of a particular, predefined hotword.
  - 17. The system of claim 12, wherein the operations comprise:
    - determining, by the computing device, that the audio data includes an utterance of a particular, predefined hotword,wherein providing the audio data as an input to the model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample is in response to determining that the audio data includes an utterance of a particular, predefined hotword.
  - 18. The system of claim 12, wherein the operations comprise:
    - receiving, by the computing device, the watermarked audio data samples that each include an audio watermark, the non-watermarked audio data samples that do not each include an audio watermark, and data indicating whether each watermarked and non-watermarked audio sample includes an audio watermark; and
      
      training, by the computing device and using machine learning, the model using the watermarked audio data samples that each include an audio watermark, the non-watermarked audio data samples that do not each include the audio watermark, and the data indicating whether each watermarked and non-watermarked audio sample includes an audio watermark.
  - 19. The system of claim 12, wherein the operations comprise:
    - determining, by the computing device, a first time of receipt of the audio data corresponding to playback of an utterance;
      
      receiving, by the computing device, a second time that an additional computing device provided, for output, the audio data corresponding to playback of an utterance and data indicating whether the audio data included a watermark;
      
      determining, by the computing device, that the first time matches the second time; and
      
      based on determining that the first time matches the second time, updating, by the computing device, the model using the data indicating whether the audio data included a watermark.

20. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, by a computing device, audio data corresponding to playback of an utterance;
  
  providing, by the computing device, the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample;
  
  receiving, by the computing device and from the model (i) that is configured to determine whether the given audio data sample includes the audio watermark and (ii) that was trained using the watermarked audio data samples that include the audio watermark and the non-watermarked audio data samples that do not include the audio watermark, data indicating whether the audio data includes the audio watermark; and
  
  based on the data indicating whether the audio data includes the audio watermark, determining, by the computing device, to continue or cease processing of the audio data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Gruenstein, Alexander H., Joglekar, Taral Pradeep, Peddinti, Vijayaditya, Bacchiani, Michiel A. U.
Primary Examiner(s)
Chawan, Vijay B

Application Number

US16/418,415
Publication Number

US 20190362719A1
Time in Patent Office

399 Days
Field of Search

704273, 704275, 7042701, 704270, 704251
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/063   Training

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 17/00   Speaker identification or v...

G10L 17/22   Interactive procedures; Man...

G10L 19/018   Audio watermarking, i.e. em...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 25/51   for comparison or discrimin...

Hotword suppression

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

108 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Hotword suppression

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

108 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links