Multi-layer keyword detection
First Claim
1. A computer-implemented method comprising:
- during a first time period by a local device including a microphone and an audio speaker;
operating a first wakeword detector configured to detect a wakeword in audio data,receiving from the microphone, first input audio data corresponding to a first utterance,determining, using the first wakeword detector, that the first input audio data includes a first representation of the wakeword, andsending the first input audio data to at least one remote device; and
by the local device at a second time period after the first time period;
receiving output audio data from the at least one remote device,determining, using a second wakeword detector, that the output audio data includes a second representation of the wakeword,disabling the first wakeword detector upon determining that the output audio includes the second representation of the wakeword, andemitting, using the audio speaker, output audio corresponding to the output audio data.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for temporarily disabling keyword detection to avoid detection of machine-generated keywords. A local device may operate two keyword detectors. The first keyword detector operates on input audio data received by a microphone to capture keywords uttered by a user. In these instances, the keyword may be detected by the first detector and the audio data may be transmitted to a remote device for processing. The remote device may generate output audio data to be sent to the local device. The local device may process the output audio data to determine that it also includes the keyword. The device may then disable the first keyword detector while the output audio data is played back by an audio speaker of the local device. Thus the local device may avoid detection of a keyword originating from the output audio. The first keyword detector may be reactivated after a time interval during which the keyword might be detectable in the output audio.
170 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
during a first time period by a local device including a microphone and an audio speaker; operating a first wakeword detector configured to detect a wakeword in audio data, receiving from the microphone, first input audio data corresponding to a first utterance, determining, using the first wakeword detector, that the first input audio data includes a first representation of the wakeword, and sending the first input audio data to at least one remote device; and by the local device at a second time period after the first time period; receiving output audio data from the at least one remote device, determining, using a second wakeword detector, that the output audio data includes a second representation of the wakeword, disabling the first wakeword detector upon determining that the output audio includes the second representation of the wakeword, and emitting, using the audio speaker, output audio corresponding to the output audio data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
at least one processor; and at least one memory including instructions that, when executed by the at least one processor, cause the system to; receive input audio data; determine, using a first detector, that the input audio data includes a first representation of a keyword; send the input audio data to at least one remote device; receive, by a first device, output audio data from the at least one remote device; determine, using a second detector, that the output audio data includes a second representation of the keyword; disable the first detector after determining the output audio data includes the second representation; and cause, by the first device, the output audio data to be emitted as output audio. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A computer-implemented method comprising:
-
receiving input audio data; determining, using a first detector, that the input audio data includes a first representation of a keyword; sending the input audio data to at least one remote device; receiving, by a first device, output audio data from the at least one remote device; determining, using a second detector, that the output audio data includes a second representation of the keyword; disabling the first detector after determining that the output audio data includes the second representation; and causing, by the first device, the output audio data to be emitted as output audio. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification