Multi-layer keyword detection

US 10,079,015 B1
Filed: 12/06/2016
Issued: 09/18/2018
Est. Priority Date: 12/06/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

during a first time period by a local device including a microphone and an audio speaker;

operating a first wakeword detector configured to detect a wakeword in audio data,receiving from the microphone, first input audio data corresponding to a first utterance,determining, using the first wakeword detector, that the first input audio data includes a first representation of the wakeword, andsending the first input audio data to at least one remote device; and

by the local device at a second time period after the first time period;

receiving output audio data from the at least one remote device,determining, using a second wakeword detector, that the output audio data includes a second representation of the wakeword,disabling the first wakeword detector upon determining that the output audio includes the second representation of the wakeword, andemitting, using the audio speaker, output audio corresponding to the output audio data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for temporarily disabling keyword detection to avoid detection of machine-generated keywords. A local device may operate two keyword detectors. The first keyword detector operates on input audio data received by a microphone to capture keywords uttered by a user. In these instances, the keyword may be detected by the first detector and the audio data may be transmitted to a remote device for processing. The remote device may generate output audio data to be sent to the local device. The local device may process the output audio data to determine that it also includes the keyword. The device may then disable the first keyword detector while the output audio data is played back by an audio speaker of the local device. Thus the local device may avoid detection of a keyword originating from the output audio. The first keyword detector may be reactivated after a time interval during which the keyword might be detectable in the output audio.

170 Citations

20 Claims

1. A computer-implemented method comprising:
- during a first time period by a local device including a microphone and an audio speaker;
  
  operating a first wakeword detector configured to detect a wakeword in audio data,receiving from the microphone, first input audio data corresponding to a first utterance,determining, using the first wakeword detector, that the first input audio data includes a first representation of the wakeword, andsending the first input audio data to at least one remote device; and
  
  by the local device at a second time period after the first time period;
  
  receiving output audio data from the at least one remote device,determining, using a second wakeword detector, that the output audio data includes a second representation of the wakeword,disabling the first wakeword detector upon determining that the output audio includes the second representation of the wakeword, andemitting, using the audio speaker, output audio corresponding to the output audio data.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computer-implemented method of claim 1, further comprising, by the local device during the second time period:
    - determining an estimated time period during which the second representation is included in the output audio; and
      
      enabling the first wakeword detector following the first time period.
  - 3. The computer-implemented method of claim 1, further comprising, by the local device during the second time period:
    - receiving, from the microphone, second input audio data corresponding to a second utterance;
      
      determining, using the first wakeword detector, that the second input audio data includes a third representation of the wakeword; and
      
      ignoring the third representation by refraining from sending the second input audio data to the at least one remote device.
  - 4. The computer-implemented method of claim 1, wherein:
    - determining, using the first wakeword detector, that the first input audio data includes the first representation comprises processing the first input audio data using the first wakeword detector operating a first trained model; and
      
      determining, using the second wakeword detector, that the output audio data includes the second representation comprises processing the first input audio data using the second wakeword detector operating the first trained model.
  - 5. The computer-implemented method of claim 1, wherein disabling the first wakeword detector comprises sending a command to the first wakeword detector to refrain from processing the audio data following the first representation.
  - 6. The computer-implemented method of claim 1, wherein disabling the first detector comprises sending a command to the first detector to not process the input audio data following the first representation.

7. A system comprising:
- at least one processor; and
  
  at least one memory including instructions that, when executed by the at least one processor, cause the system to;
  
  receive input audio data;
  
  determine, using a first detector, that the input audio data includes a first representation of a keyword;
  
  send the input audio data to at least one remote device;
  
  receive, by a first device, output audio data from the at least one remote device;
  
  determine, using a second detector, that the output audio data includes a second representation of the keyword;
  
  disable the first detector after determining the output audio data includes the second representation; and
  
  cause, by the first device, the output audio data to be emitted as output audio.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The system of claim 7, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - determine that the second representation will be emitted as output audio during a first time period; and
      
      enable the first detector following the first time period.
  - 9. The system of claim 8, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - receive second input audio data after sending the input audio data but prior to a beginning of the first time period;
      
      determine, using the first detector, that the second input audio data includes a third representation of the keyword; and
      
      send the second input audio data to the at least one remote device.
  - 10. The system of claim 7, wherein the first device comprises a microphone and an audio speaker, wherein the first detector is connected to the microphone, and wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - cause the output audio data to be sent to the audio speaker.
  - 11. The system of claim 10, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - disable the first detector prior to causing the output audio data to be sent to the audio speaker.
  - 12. The system of claim 7, wherein the input audio data is sent to a first remote device and the output audio data is received from a second remote device.
  - 13. The system of claim 7, wherein the at least one memory further includes instructions that, when executed by the at least one processor, further cause the system to:
    - cause an output of the first detector to be discarded.

14. A computer-implemented method comprising:
- receiving input audio data;
  
  determining, using a first detector, that the input audio data includes a first representation of a keyword;
  
  sending the input audio data to at least one remote device;
  
  receiving, by a first device, output audio data from the at least one remote device;
  
  determining, using a second detector, that the output audio data includes a second representation of the keyword;
  
  disabling the first detector after determining that the output audio data includes the second representation; and
  
  causing, by the first device, the output audio data to be emitted as output audio.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The computer-implemented method of claim 14, further comprising:
    - determining that the second representation will be emitted as output audio during a first time period; and
      
      enabling the first detector following the first time period.
  - 16. The computer-implemented method of claim 15, further comprising:
    - receiving second input audio data after sending the input audio data but prior to a beginning of the first time period;
      
      determining, using the first detector, that the second input audio data includes a third representation of a keyword; and
      
      sending the second input audio data to the at least one remote device.
  - 17. The computer-implemented method of claim 14, wherein the input audio data is received from a microphone of the first device, and causing the output audio data to be emitted comprises causing the output audio data to be sent to an audio speaker of the first device.
  - 18. The computer-implemented method of claim 17, further comprising disabling the first detector prior to causing the output audio data to be sent to the audio speaker.
  - 19. The computer-implemented method of claim 14, wherein the input audio data is sent to a first remote device and the output audio data is received from a second remote device.
  - 20. The computer-implemented method of claim 14, wherein disabling the first detector comprises causing an output of the first detector to be discarded.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Lockhart, Christopher Wayne, Cole, Matthew Joseph, Liu, Xulei
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
Ogunbiyi, Oluwadamilola M

Application Number

US15/370,216
Time in Patent Office

651 Days
Field of Search

704251
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

Multi-layer keyword detection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

170 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Multi-layer keyword detection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

170 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others