Hotword detection on multiple devices

US 10,242,676 B2
Filed: 04/13/2018
Issued: 03/26/2019
Est. Priority Date: 08/24/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by a first computing device that is configured to respond to a particular, predefined hotword and from a second computing device that is in a vicinity of the first computing device, data indicating that the second computing device is configured to respond to the particular, predefined hotword;

transmitting, to the second computing device and by the first computing device, data indicating that the first computing device is configured to respond to the particular, predefined hotword;

receiving, by the first computing device, audio data that corresponds to an utterance;

determining that the utterance likely includes a particular, predefined hotword;

in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to a server, (i) data indicating that the first computing device likely received the particular, predefined hotword, and (ii) data identifying the first computing device;

receiving, from the server, an instruction to suppress speech recognition processing on the audio data; and

in response to receiving the instruction to suppress speech recognition processing on the audio data, suppressing, by the first computing device, processing of the audio data using the automated speech recognizer.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance. The actions further include determining that the utterance likely includes a particular, predefined hotword. The actions further include transmitting (i) data indicating that the computing device likely received the particular, predefined hotword, (ii) data identifying the computing device, and (iii) data identifying a group of nearby computing devices that includes the computing device. The actions further include receiving an instruction to commence speech recognition processing on the audio data. The actions further include in response to receiving the instruction to commence speech recognition processing on the audio data, processing at least a portion of the audio data using an automated speech recognizer on the computing device.

110 Citations

20 Claims

1. A computer-implemented method comprising:
- receiving, by a first computing device that is configured to respond to a particular, predefined hotword and from a second computing device that is in a vicinity of the first computing device, data indicating that the second computing device is configured to respond to the particular, predefined hotword;
  
  transmitting, to the second computing device and by the first computing device, data indicating that the first computing device is configured to respond to the particular, predefined hotword;
  
  receiving, by the first computing device, audio data that corresponds to an utterance;
  
  determining that the utterance likely includes a particular, predefined hotword;
  
  in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to a server, (i) data indicating that the first computing device likely received the particular, predefined hotword, and (ii) data identifying the first computing device;
  
  receiving, from the server, an instruction to suppress speech recognition processing on the audio data; and
  
  in response to receiving the instruction to suppress speech recognition processing on the audio data, suppressing, by the first computing device, processing of the audio data using the automated speech recognizer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, comprising:
    - determining a loudness of the audio data associated with the particular, predefined hotword; and
      
      in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, the loudness of the audio data associated with the particular, predefined hotword.
  - 3. The method of claim 2, wherein determining a loudness of the audio data associated with the particular, predefined hotword comprises:
    - determining a power of the audio data associated with the particular, predefined hotword; and
      
      determining a power of audio data that is not associated with the particular, predefined hotword and that the first computing device received before the audio data associated with the particular, predefined hotword,wherein the loudness of the audio data associated with the particular, predefined hotword is based on the power of the audio data associated with the particular, predefined hotword and the power of the audio data that is not associated with the particular, predefined hotword and that the first computing device received before the audio data associated with the particular, predefined hotword.
  - 4. The method of claim 1, comprising:
    - determining a confidence score that reflects a likelihood that the audio data associated with the particular, predefined hotword corresponds to the particular, predefined hotword; and
      
      in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, the confidence score.
  - 5. The method of claim 4, wherein determining a confidence score that reflects a likelihood that the audio data associated with the particular, predefined hotword corresponds to the particular, predefined hotword comprises:
    - determining audio features from the audio data associated with the particular, predefined hotword; and
      
      based on the audio features, determining, using a neural network, the confidence score.
  - 6. The method of claim 1, comprising:
    - in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, data indicating a location of the first computing device.
  - 7. The method of claim 1, comprising:
    - in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, data indicating an elapsed time since a previous use of the first computing device.
  - 8. The method of claim 1, comprising:
    - in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, data indicating a previous action performed by the first computing device.
  - 9. The method of claim 1, comprising:
    - determining a group identifier that identifies the first computing device and the second computing device; and
      
      in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, the group identifier that identifies the first computing device and the second computing device.

10. A system comprising:
- one or more computers; and
  
  one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by a first computing device that is configured to respond to a particular, predefined hotword and from a second computing device that is in a vicinity of the first computing device, data indicating that the second computing device is configured to respond to the particular, predefined hotword;
  
  transmitting, to the second computing device and by the first computing device, data indicating that the first computing device is configured to respond to the particular, predefined hotword;
  
  receiving, by the first computing device, audio data that corresponds to an utterance;
  
  determining that the utterance likely includes a particular, predefined hotword;
  
  in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to a server, (i) data indicating that the first computing device likely received the particular, predefined hotword, and (ii) data identifying the first computing device;
  
  receiving, from the server, an instruction to suppress speech recognition processing on the audio data; and
  
  in response to receiving the instruction to suppress speech recognition processing on the audio data, suppressing, by the first computing device, processing of the audio data using the automated speech recognizer.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The system of claim 10, wherein the operations comprise:
    - determining a loudness of the audio data associated with the particular, predefined hotword; and
      
      in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, the loudness of the audio data associated with the particular, predefined hotword.
  - 12. The system of claim 11, wherein determining a loudness of the audio data associated with the particular, predefined hotword comprises:
    - determining a power of the audio data associated with the particular, predefined hotword; and
      
      determining a power of audio data that is not associated with the particular, predefined hotword and that the first computing device received before the audio data associated with the particular, predefined hotword,wherein the loudness of the audio data associated with the particular, predefined hotword is based on the power of the audio data associated with the particular, predefined hotword and the power of the audio data that is not associated with the particular, predefined hotword and that the first computing device received before the audio data associated with the particular, predefined hotword.
  - 13. The system of claim 10, wherein the operations comprise:
    - determining a confidence score that reflects a likelihood that the audio data associated with the particular, predefined hotword corresponds to the particular, predefined hotword; and
      
      in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, the confidence score.
  - 14. The system of claim 13, wherein determining a confidence score that reflects a likelihood that the audio data associated with the particular, predefined hotword corresponds to the particular, predefined hotword comprises:
    - determining audio features from the audio data associated with the particular, predefined hotword; and
      
      based on the audio features, determining, using a neural network, the confidence score.
  - 15. The system of claim 10, wherein the operations comprise:
    - in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, data indicating a location of the first computing device.
  - 16. The system of claim 10, wherein the operations comprise:
    - in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, data indicating an elapsed time since a previous use of the first computing device.
  - 17. The system of claim 10, wherein the operations comprise:
    - in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, data indicating a previous action performed by the first computing device.
  - 18. The system of claim 10, wherein the operations comprise:
    - determining a group identifier that identifies the first computing device and the second computing device; and
      
      in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, the group identifier that identifies the first computing device and the second computing device.

19. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, by a first computing device that is configured to respond to a particular, predefined hotword and from a second computing device that is in a vicinity of the first computing device, data indicating that the second computing device is configured to respond to the particular, predefined hotword;
  
  transmitting, to the second computing device and by the first computing device, data indicating that the first computing device is configured to respond to the particular, predefined hotword;
  
  receiving, by the first computing device, audio data that corresponds to an utterance;
  
  determining that the utterance likely includes a particular, predefined hotword;
  
  in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to a server, (i) data indicating that the first computing device likely received the particular, predefined hotword, and (ii) data identifying the first computing device;
  
  receiving, from the server, an instruction to suppress speech recognition processing on the audio data; and
  
  in response to receiving the instruction to suppress speech recognition processing on the audio data, suppressing, by the first computing device, processing of the audio data using the automated speech recognizer.
- View Dependent Claims (20)
- - 20. The medium of claim 19, wherein the operations comprise:
    - determining a loudness of the audio data associated with the particular, predefined hotword; and
      
      in response to determining that the utterance likely includes the particular, predefined hotword, transmitting, to the server, the loudness of the audio data associated with the particular, predefined hotword.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Melendo Casado, Diego, Gruenstein, Alexander H., Foerster, Jakob N.
Primary Examiner(s)
Vo, Huyen X

Application Number

US15/952,434
Publication Number

US 20180286406A1
Time in Patent Office

347 Days
Field of Search

704 1- 10, 704230-277, 709203
US Class Current
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/16   using artificial neural net...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 25/78   Detection of presence or ab...

H04L 67/10   in which an application is ...

H05K 999/99   dummy group

Hotword detection on multiple devices

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

110 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Hotword detection on multiple devices

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

110 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others