Hotword detection on multiple devices

US 9,318,107 B1
Filed: 04/01/2015
Issued: 04/19/2016
Est. Priority Date: 10/09/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by a first computing device, audio data that corresponds to an utterance;

before beginning automated speech recognition processing on the audio data, processing the audio data using a classifier that classifies audio data as including a particular hotword or as not including the particular hotword;

determining, based on the processing of the audio data using the classifier that classifies audio data as including a particular hotword or as not including the particular hotword, a first value that reflects a first likelihood that the utterance includes the particular hotword;

receiving a second value that reflects a second likelihood that the utterance includes the particular hotword, as determined by a second computing device;

comparing the first value that reflects the first likelihood that the utterance includes the particular hotword and the second value that reflects the second likelihood that the utterance includes the particular hotword; and

based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword, determining whether to begin performing automated speech recognition processing on the audio data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

272 Citations

22 Claims

1. A computer-implemented method comprising:
- receiving, by a first computing device, audio data that corresponds to an utterance;
  
  before beginning automated speech recognition processing on the audio data, processing the audio data using a classifier that classifies audio data as including a particular hotword or as not including the particular hotword;
  
  determining, based on the processing of the audio data using the classifier that classifies audio data as including a particular hotword or as not including the particular hotword, a first value that reflects a first likelihood that the utterance includes the particular hotword;
  
  receiving a second value that reflects a second likelihood that the utterance includes the particular hotword, as determined by a second computing device;
  
  comparing the first value that reflects the first likelihood that the utterance includes the particular hotword and the second value that reflects the second likelihood that the utterance includes the particular hotword; and
  
  based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword, determining whether to begin performing automated speech recognition processing on the audio data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, comprising:
    - determining that the first value satisfies a hotword score; and
      
      based on determining that the first value satisfies the hotword score threshold, transmitting, to the second device, the first value that reflects the first likelihood that the utterance includes the particular hotword.
  - 3. The method of claim 1, comprising:
    - determining an activation state of the first computing device based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword.
  - 4. The method of claim 3, wherein determining an activation state of the first computing device based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword comprises:
    - determining that the activation state of the first computing device is an active state.
  - 5. The method of claim 1, comprising:
    - receiving, by the first computing device, additional audio data that corresponds to an additional utterance;
      
      before beginning automated speech recognition processing on the audio data, processing the audio data using a classifier that classifies audio data as including a particular hotword or as not including the particular hotword;
      
      determining, based on the processing of the audio data using the classifier that classifies audio data as including a particular hotword or as not including the particular hotword, a third value that reflects a third likelihood that the additional utterance includes the particular hotword;
      
      receiving a fourth value that reflects a fourth likelihood that the additional utterance includes the particular hotword, as determined by a third computing device;
      
      comparing the third value that reflects the third likelihood that the additional utterance includes the particular hotword and the fourth value that reflects the fourth likelihood that the additional utterance includes the particular hotword; and
      
      based on comparing the third value that reflects the third likelihood that the additional utterance includes the particular hotword and the fourth value that reflects the fourth likelihood that the additional utterance includes the particular hotword, determining whether to begin performing automated speech recognition processing on the additional audio data.
  - 6. The method of claim 1, wherein:
    - receiving a second value that reflects a second likelihood that the utterance includes the particular hotword, as determined by a second computing device comprises;
      
      receiving, from a server, through a local network, or through a short range radio communication channel, a second value that reflects the second likelihood that the utterance includes the particular hotword.
  - 7. The method of claim 1, comprising:
    - determining that the second computing device is configured to respond to utterances that include the particular hotword,wherein comparing the first value that reflects the first likelihood that the utterance includes the particular hotword and the second value that reflects the second likelihood that the utterance includes the particular hotword is performed in response to determining that the second computing device is configured to respond to utterances that include the particular hotword.
  - 8. The method of claim 1, wherein:
    - receiving a second value that reflects a second likelihood that the utterance includes the particular hotword, as determined by a second computing device comprises;
      
      receiving a second identifier of the second computing device.
  - 9. The method of claim 4, wherein determiningwhether to begin performing automated speech recognition processing on the audio data is further based on determining that a particular amount of time has elapsed since receiving the audio data that corresponds to the utterance.
  - 10. The method of claim 4, comprising:
    - transmitting, for a particular amount of time, the first value that reflects the first likelihood that the utterance includes the particular hotword based on determining that the activation state is an active state.
  - 11. The method of claim 1, comprising:
    - based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword, determining that the first value that reflects the first likelihood that the utterance includes the particular hotword is greater than the second value that reflects the second likelihood that the utterance includes the particular hotword,wherein determining whether to perform automated speech recognition processing on the audio data comprises;
      
      based on determining that the first value that reflects the first likelihood that the utterance includes the particular hotword is greater than the second value that reflects the second likelihood that the utterance includes the particular hotword, determining to begin performing automated speech recognition processing on the audio data.
  - 12. The method of claim 1, comprising:
    - based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword, determining that the first value that reflects the first likelihood that the utterance includes the particular hotword is less than the second value that reflects the second likelihood that the utterance includes the particular hotword,wherein determining whether to perform automated speech recognition processing on the audio data comprises;
      
      based on determining that the first value that reflects the first likelihood that the utterance includes the particular hotword is less than the second value that reflects the second likelihood that the utterance includes the particular hotword, determining not to begin performing automated speech recognition processing on the audio data.
  - 13. The method of claim 1, wherein processing the audio data using a classifier that classifies audio data as including a particular hotword or as not including the particular hotword comprises:
    - extracting filterbank energies or mel-frequency cepstral coefficients from the audio data.
  - 14. The method of claim 1, wherein processing the audio data using a classifier that classifies audio data as including a particular hotword or as not including the particular hotword comprises:
    - processing the audio data using a support vector machine or a neural network.

15. A computing device comprising:
- one or more storage devices storing instructions that are operable, when executed by the computing device, to cause the computing device to perform operations comprising;
  
  receiving, by a first computing device, audio data that corresponds to an utterance;
  
  before beginning automated speech recognition processing on the audio data, processing the audio data using a classifier that classifies audio data as including a particular hotword or as not including the particular hotword;
  
  determining, based on the processing of the audio data using the classifier that classifies audio data as including a particular hotword or as not including the particular hotword, a first value that reflects a first likelihood that the utterance includes the particular hotword;
  
  receiving a second value that reflects a second likelihood that the utterance includes the particular hotword, as determined by a second computing device;
  
  comparing the first value that reflects the first likelihood that the utterance includes the particular hotword and the second value that reflects the second likelihood that the utterance includes the particular hotword; and
  
  based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword, determining whether to begin performing automated speech recognition processing on the audio data.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The device of claim 15, wherein the operations further comprise:
    - determining that the first value satisfies a hotword score; and
      
      based on determining that the first value satisfies the hotword score threshold, transmitting, to the second device, the first value that reflects the first likelihood that the utterance includes the particular hotword.
  - 17. The device of claim 15, wherein the operations further comprise:
    - determining an activation state of the first computing device based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword.
  - 18. The device of claim 17, wherein determining an activation state of the first computing device based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword comprises:
    - determining that the activation state of the first computing device is an active state.
  - 19. The device of claim 15, wherein the operations further comprise:
    - receiving, by the first computing device, additional audio data that corresponds to an additional utterance;
      
      before beginning automated speech recognition processing on the audio data, processing the audio data using a classifier that classifies audio data as including a particular hotword or as not including the particular hotword;
      
      determining, based on the processing of the audio data using the classifier that classifies audio data as including a particular hotword or as not including the particular hotword, a third value that reflects a third likelihood that the additional utterance includes the particular hotword;
      
      receiving a fourth value that reflects a fourth likelihood that the additional utterance includes the particular hotword, as determined by a third computing device;
      
      comparing the third value that reflects the third likelihood that the additional utterance includes the particular hotword and the fourth value that reflects the fourth likelihood that the additional utterance includes the particular hotword; and
      
      based on comparing the third value that reflects the third likelihood that the additional utterance includes the particular hotword and the fourth value that reflects the fourth likelihood that the additional utterance includes the particular hotword, determining whether to begin performing automated speech recognition processing on the additional audio data.
  - 20. The device of claim 15, wherein:
    - receiving a second value that reflects a second likelihood that the utterance includes the particular hotword, as determined by a second computing device comprises;
      
      receiving, from a server, through a local network, or through a short range radio communication channel, a second value that reflects the second likelihood that the utterance includes the particular hotword.
  - 21. The device of claim 15, wherein the operations further comprise:
    - determining that the second computing device is configured to respond to utterances that include the particular hotword,wherein comparing the first value that reflects the first likelihood that the utterance includes the particular hotword and the second value that reflects the second likelihood that the utterance includes the particular hotword is performed in response to determining that the second computing device is configured to respond to utterances that include the particular hotword.

22. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, by a first computing device, audio data that corresponds to an utterance;
  
  before beginning automated speech recognition processing on the audio data, processing the audio data using a classifier that classifies audio data as including a particular hotword or as not including the particular hotword;
  
  determining, based on the processing of the audio data using the classifier that classifies audio data as including a particular hotword or as not including the particular hotword, a first value that reflects a first likelihood that the utterance includes the particular hotword;
  
  receiving a second value that reflects a second likelihood that the utterance includes the particular hotword, as determined by a second computing device;
  
  comparing the first value that reflects the first likelihood that the utterance includes the particular hotword and the second value that reflects the second likelihood that the utterance includes the particular hotword; and
  
  based on comparing the first value that reflects the first likelihood that the utterance includes the particular hotword to the second value that reflects the second likelihood that the utterance includes the particular hotword, determining whether to begin performing automated speech recognition processing on the audio data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sharifi, Matthew
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US14/675,932
Publication Number

US 20160104480A1
Time in Patent Office

384 Days
Field of Search

704/254, 704/275, 704/273, 704/270, 704/257, 704/256, 704/251, 704/250, 704/247, 704/246, 704/244, 704/243, 704/235, 704/233, 704/231, 704/228, 704/224, 704/2, 455/414.1, 434/157, 391/71.1, 379/88.03, 379/88.01, 367/124
US Class Current

1/1
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/01   Assessment or evaluation of...

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 15/285   Memory allocation or algori...

G10L 15/32   Multiple recognisers used i...

G10L 17/22   Interactive procedures; Man...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

Hotword detection on multiple devices

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

272 Citations

22 Claims

Specification

Use Cases

Quick Links

Others

Hotword detection on multiple devices

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

272 Citations

22 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others