Networked devices, systems, and methods for intelligently deactivating wake-word engines

US 10,878,811 B2
Filed: 09/14/2018
Issued: 12/29/2020
Est. Priority Date: 09/14/2018
Status: Active Grant

First Claim

Patent Images

1. A playback device comprising:

a network interface;

one or more processors;

at least one microphone;

at least one speaker configured to output audio based on an audio stream;

a first wake-word engine configured to receive as input sound data based on sound detected by the at least one microphone, wherein the first wake-word engine is configured according to a first sensitivity level for false positives of a particular wake word;

a second wake-word engine configured to receive as input the audio stream, wherein the second wake-word engine is configured according to a second sensitivity level for false positives of the particular wake word that is more sensitive than the first sensitivity level;

a tangible, non-transitory, computer-readable medium having instructions stored thereon that are executable by the one or more processors to cause the playback device to;

identify in the audio stream, via the second wake-word engine, a false wake word for the first wake-word engine; and

based on identifying the false wake word, (i) deactivate the first wake-word engine and (ii) cause, via the network interface, at least one network microphone device to deactivate a wake-word engine of the at least one network microphone device for a particular amount of time;

while the first wake-word engine is deactivated, cause the at least one speaker to output the audio based on the audio stream; and

after a predetermined amount of time has elapsed, reactivate the first wake-word engine.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one aspect, a playback device is configured to identify in an audio stream, via a second wake-word engine, a false wake word for a first wake-word engine that is configured to receive as input sound data based on sound detected by a microphone. The first and second wake-word engines are configured according to different sensitivity levels for false positives of a particular wake word. Based on identifying the false wake word, the playback device is configured to (i) deactivate the first wake-word engine and (ii) cause at least one network microphone device to deactivate a wake-word engine for a particular amount of time. While the first wake-word engine is deactivated, the playback device is configured to cause at least one speaker to output audio based on the audio stream. After a predetermined amount of time has elapsed, the playback device is configured to reactivate the first wake-word engine.

783 Citations

20 Claims

1. A playback device comprising:
- a network interface;
  
  one or more processors;
  
  at least one microphone;
  
  at least one speaker configured to output audio based on an audio stream;
  
  a first wake-word engine configured to receive as input sound data based on sound detected by the at least one microphone, wherein the first wake-word engine is configured according to a first sensitivity level for false positives of a particular wake word;
  
  a second wake-word engine configured to receive as input the audio stream, wherein the second wake-word engine is configured according to a second sensitivity level for false positives of the particular wake word that is more sensitive than the first sensitivity level;
  
  a tangible, non-transitory, computer-readable medium having instructions stored thereon that are executable by the one or more processors to cause the playback device to;
  
  identify in the audio stream, via the second wake-word engine, a false wake word for the first wake-word engine; and
  
  based on identifying the false wake word, (i) deactivate the first wake-word engine and (ii) cause, via the network interface, at least one network microphone device to deactivate a wake-word engine of the at least one network microphone device for a particular amount of time;
  
  while the first wake-word engine is deactivated, cause the at least one speaker to output the audio based on the audio stream; and
  
  after a predetermined amount of time has elapsed, reactivate the first wake-word engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The playback device of claim 1, wherein the playback device comprises a buffer configured to store information regarding characteristics of sound detected by the at least one microphone, wherein the characteristics comprise one or more of spectral or gain characteristics, and wherein the second sensitivity level is defined based at least on the stored information.
  - 3. The playback device of claim 1, wherein the first wake-word engine is configured to, while activated, trigger extraction of a first sound input received via the at least one microphone in response to identifying in the first sound input the particular wake word or the false wake word, and wherein the wake-word engine of the at least one network microphone device is configured to, while activated, trigger extraction of a second sound input received via a microphone of the at least one network microphone device in response to identifying in the second sound input the particular wake word or the false wake word.
  - 4. The playback device of claim 1, wherein the particular amount of time is a first amount of time and the predetermined amount of time is a second amount of time that differs from the first amount of time, and wherein the instructions that are executable by the one or more processors further cause the playback device to define the first amount of time based on the identifying of the false wake word.
  - 5. The playback device of claim 1, wherein the instructions that are executable by the one or more processors to cause the playback device to cause the at least one network microphone device to deactivate the wake-word engine of the at least one network microphone device comprise instructions that are executable by the one or more processors to cause the playback device to:
    - identify the at least one network microphone device for deactivation from a plurality of network microphone devices.
  - 6. The playback device of claim 1, further comprising a third wake-word engine configured to receive as input the sound data based on sound detected by the at least one microphone, wherein the particular wake word is a first particular wake word, and wherein:
    - the first wake-word engine is configured to identify the first particular wake word in a sound input received via the at least one microphone and trigger voice extraction in response to identifying the first particular wake word;
      
      the third wake-word engine is configured to identify a second particular wake word in a sound input received via the at least one microphone and trigger voice extraction in response to identifying the second particular wake word; and
      
      the instructions executable by the one or more processors further cause the playback device to deactivate both of the first and third wake-word engines based on identifying the false wake word.
  - 7. The playback device of claim 1, wherein the playback device is a first playback device, and wherein the instructions executable by the one or more processors further cause the first playback device to:
    - group the first playback device with at least a second playback device, wherein the grouped playback devices are configured to synchronously play back audio; and
      
      based on identifying the false wake word, cause, via the network interface, each playback device grouped with the first playback device to deactivate a respective wake-word engine for a given amount of time.

8. A tangible, non-transitory, computer-readable medium having instructions stored thereon that are executable by one or more processors to cause a playback device to:
- identify in an audio stream, via a second wake-word engine, a false wake word for a first wake-word engine that is configured to receive as input sound data based on sound detected by at least one microphone of the playback device, wherein the first wake-word engine is configured according to a first sensitivity level for false positives of a particular wake word, and wherein the second wake-word engine is configured according to a second sensitivity level for false positives of the particular wake word that is more sensitive than the first sensitivity level;
  
  based on identifying the false wake word, (i) deactivate the first wake-word engine and (ii) cause, via a network interface of the playback device, at least one network microphone device to deactivate a wake-word engine of the at least one network microphone device for a particular amount of time;
  
  while the first wake-word engine is deactivated, cause at least one speaker of the playback device to output audio based on the audio stream; and
  
  after a predetermined amount of time has elapsed, reactivate the first wake-word engine.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer-readable medium of claim 8, wherein the playback device comprises a buffer configured to store information regarding characteristics of sound detected by the at least one microphone, wherein the characteristics comprise one or more of spectral or gain characteristics, and wherein the second sensitivity level is defined based at least on the stored information.
  - 10. The computer-readable medium of claim 8, wherein the first wake-word engine is configured to, while activated, trigger extraction of a first sound input received via the at least one microphone in response to identifying in the first sound input the particular wake word or the false wake word, and wherein the wake-word engine of the at least one network microphone device is configured to, while activated, trigger extraction of a second sound input received via a microphone of the at least one network microphone device in response to identifying in the second sound input the particular wake word or the false wake word.
  - 11. The computer-readable medium of claim 8, wherein the particular amount of time is a first amount of time and the predetermined amount of time is a second amount of time that differs from the first amount of time, and wherein the instructions that are executable by the one or more processors further cause the playback device to define the first amount of time based on the identifying of the false wake word.
  - 12. The computer-readable medium of claim 8, wherein the instructions that are executable by the one or more processors to cause the playback device to cause the at least one network microphone device to deactivate the wake-word engine of the at least one network microphone device comprise instructions that are executable by the one or more processors to cause the playback device to:
    - identify the at least one network microphone device for deactivation from a plurality of network microphone devices.
  - 13. The computer-readable medium of claim 8, wherein the playback device further comprises a third wake-word engine configured to receive as input the sound data based on sound detected by the at least one microphone, wherein the particular wake word is a first particular wake word, and wherein:
    - the first wake-word engine is configured to identify the first particular wake word in a sound input received via the at least one microphone and trigger voice extraction in response to identifying the first particular wake word;
      
      the third wake-word engine is configured to identify a second particular wake word in a sound input received via the at least one microphone and trigger voice extraction in response to identifying the second particular wake word; and
      
      the instructions executable by the one or more processors further cause the playback device to deactivate both of the first and third wake-word engines based on identifying the false wake word.
  - 14. The computer-readable medium of claim 8, wherein the playback device is a first playback device, and wherein the instructions executable by the one or more processors further cause the first playback device to:
    - group the first playback device with at least a second playback device, wherein the grouped playback devices are configured to synchronously play back audio; and
      
      based on identifying the false wake word, cause, via the network interface, each playback device grouped with the first playback device to deactivate a respective wake-word engine for a given amount of time.

15. A computer-implemented method comprising:
- identifying in an audio stream, via a second wake-word engine of a playback device, a false wake word for a first wake-word engine that is configured to receive as input sound data based on sound detected by at least one microphone of the playback device, wherein the first wake-word engine is configured according to a first sensitivity level for false positives of a particular wake word, and wherein the second wake-word engine is configured according to a second sensitivity level for false positives of the particular wake word that is more sensitive than the first sensitivity level;
  
  based on identifying the false wake word, (i) deactivating the first wake-word engine and (ii) causing, via a network interface of the playback device, at least one network microphone device to deactivate a wake-word engine of the at least one network microphone device for a particular amount of time;
  
  while the first wake-word engine is deactivated, causing at least one speaker of the playback device to output audio based on the audio stream; and
  
  after a predetermined amount of time has elapsed, reactivating the first wake-word engine.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-implemented method of claim 15, wherein the playback device comprises a buffer configured to store information regarding characteristics of sound detected by the at least one microphone, wherein the characteristics comprise one or more of spectral or gain characteristics, and wherein the second sensitivity level is defined based at least on the stored information.
  - 17. The computer-implemented method of claim 15, wherein the particular amount of time is a first amount of time and the predetermined amount of time is a second amount of time that differs from the first amount of time, and wherein the method further comprises defining the first amount of time based on the identifying of the false wake word.
  - 18. The computer-implemented method of claim 15, wherein causing the at least one network microphone device to deactivate the wake-word engine of the at least one network microphone device comprises identifying the at least one network microphone device for deactivation from a plurality of network microphone devices.
  - 19. The computer-implemented method of claim 15, wherein the playback device further comprises a third wake-word engine configured to receive as input the sound data based on sound detected by the at least one microphone, wherein the particular wake word is a first particular wake word, and wherein:
    - the first wake-word engine is configured to identify the first particular wake word in a sound input received via the at least one microphone and trigger voice extraction in response to identifying the first particular wake word;
      
      the third wake-word engine is configured to identify a second particular wake word in a sound input received via the at least one microphone and trigger voice extraction in response to identifying the second particular wake word; and
      
      the method further comprises deactivating both of the first and third wake-word engines based on identifying the false wake word.
  - 20. The computer-implemented method of claim 15, wherein the playback device is a first playback device, and wherein the method further comprises:
    - grouping the first playback device with at least a second playback device, wherein the grouped playback devices are configured to synchronously play back audio; and
      
      based on identifying the false wake word, causing, via the network interface, each playback device grouped with the first playback device to deactivate a respective wake-word engine for a given amount of time.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sonos, Inc.
Original Assignee
Sonos, Inc.
Inventors
Smith, Connor Kristopher, Sleith, Charles Conor, Soto, Kurt Thomas
Primary Examiner(s)
Zhu, Richard Z

Application Number

US16/131,409
Publication Number

US 20200090646A1
Time in Patent Office

837 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/08   Speech classification or se...

G10L 15/083   Recognition networks G10L15...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 25/78   Detection of presence or ab...

H04L 67/12   specially adapted for propr...

Networked devices, systems, and methods for intelligently deactivating wake-word engines

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

783 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Networked devices, systems, and methods for intelligently deactivating wake-word engines

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

783 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links