Voice detection optimization based on selected voice assistant service

US 10,573,321 B1
Filed: 06/07/2019
Issued: 02/25/2020
Est. Priority Date: 09/25/2018
Status: Active Grant

First Claim

Patent Images

1. A playback device comprising:

a plurality of microphones;

a network interface;

one or more processors; and

tangible, non-transitory computer-readable media having stored therein instructions executable by the one or more processors to cause the playback device to perform a method comprising;

capturing audio via a first set of microphones selected from the plurality of microphones;

analyzing the audio captured via the first set of microphones using a first wake-word engine on the playback device to detect a first wake word;

selecting a second wake-word engine on the playback device, wherein the second wake-word engine is different from the first wake-word engine;

after selecting the second wake-word engine, capturing audio via a second set of microphones selected from the plurality of microphones, wherein the second set of microphones is different from the first set of microphones;

analyzing the audio captured via the second set of microphones using the second wake-word engine to detect a second wake word;

detecting a wake word via one of the first wake-word engine or the second wake-word engine, wherein the detected wake word comprises one of the first wake word or the second wake word; and

transmitting, via the network interface, at least a voice utterance following the detected wake word to one or more remote servers corresponding to a particular voice assistant service associated with the detected wake word.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for optimizing voice detection via a network microphone device (NMD) based on a selected voice-assistant service (VAS) are disclosed herein. In one example, the NMD detects sound via individual microphones and selects a first VAS to communicate with the NMD. The NMD produces a first sound-data stream based on the detected sound using a spatial processor in a first configuration. Once the NMD determines that a second VAS is to be selected over the first VAS, the spatial processor assumes a second configuration for producing a second sound-data stream based on the detected sound. The second sound-data stream is then transmitted to one or more remote computing devices associated with the second VAS.

583 Citations

20 Claims

1. A playback device comprising:
- a plurality of microphones;
  
  a network interface;
  
  one or more processors; and
  
  tangible, non-transitory computer-readable media having stored therein instructions executable by the one or more processors to cause the playback device to perform a method comprising;
  
  capturing audio via a first set of microphones selected from the plurality of microphones;
  
  analyzing the audio captured via the first set of microphones using a first wake-word engine on the playback device to detect a first wake word;
  
  selecting a second wake-word engine on the playback device, wherein the second wake-word engine is different from the first wake-word engine;
  
  after selecting the second wake-word engine, capturing audio via a second set of microphones selected from the plurality of microphones, wherein the second set of microphones is different from the first set of microphones;
  
  analyzing the audio captured via the second set of microphones using the second wake-word engine to detect a second wake word;
  
  detecting a wake word via one of the first wake-word engine or the second wake-word engine, wherein the detected wake word comprises one of the first wake word or the second wake word; and
  
  transmitting, via the network interface, at least a voice utterance following the detected wake word to one or more remote servers corresponding to a particular voice assistant service associated with the detected wake word.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The playback device of claim 1, wherein the first wake-word engine is associated with a first voice assistant service and the second wake-word engine is associated with a second voice assistant service, wherein capturing audio via the first set of microphones comprises capturing audio detected by the first set of microphones using a first signal processing scheme configured for the first voice assistant service, and wherein capturing audio via the second set of microphones comprises capturing audio detected by the second set of microphones using a second signal processing scheme configured for the second voice assistant service.
  - 3. The playback device of claim 2, wherein capturing audio via the second set of microphones further comprises capturing the audio detected by the second set of microphones using the second signal processing scheme while concurrently capturing the audio detected by the first set of microphones using the first signal processing scheme.
  - 4. The playback device of claim 2, wherein the method further comprises:
    - in response to selecting the second wake-word engine,electing the second voice assistant service to process voice input over the first voice assistant service.
  - 5. The playback device of claim 1, wherein capturing audio via the second set of microphones comprises capturing audio via the second set of microphones while concurrently capturing the audio via the first set of microphones.
  - 6. The playback device of claim 1, wherein the second set of microphones comprises fewer microphones than the first set of microphones.
  - 7. The playback device of claim 1, wherein each microphone of the plurality of microphones is in one of the first set of microphones or the second set of microphones.

8. A tangible, non-transitory computer-readable medium having stored therein instructions executable by one or more processors to cause a playback device to perform a method comprising:
- capturing audio via a first set of microphones selected from a plurality of microphones of the playback device;
  
  analyzing the audio captured via the first set of microphones using a first wake-word engine on the playback device to detect a first wake word;
  
  selecting a second wake-word engine on the playback device, wherein the second wake-word engine is different from the first wake-word engine;
  
  after selecting the second wake-word engine, capturing audio via a second set of microphones selected from the plurality of microphones, wherein the second set of microphones is different from the first set of microphones;
  
  analyzing the audio captured via the second set of microphones using the second wake-word engine to detect a second wake word;
  
  detecting a wake word via one of the first wake-word engine or the second wake-word engine, wherein the detected wake word comprises one of the first wake word or the second wake word; and
  
  transmitting, via a network interface of the playback device, at least a voice utterance following the detected wake word to one or more remote servers corresponding to a particular voice assistant service associated with the detected wake word.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The tangible, non-transitory computer-readable medium of claim 8, wherein the first wake-word engine is associated with a first voice assistant service and the second wake-word engine is associated with a second voice assistant service, wherein capturing audio via the first set of microphones comprises capturing audio detected by the first set of microphones using a first signal processing scheme configured for the first voice assistant service, and wherein capturing audio via the second set of microphones comprises capturing audio detected by the second set of microphones using a second signal processing scheme configured for the second voice assistant service.
  - 10. The tangible, non-transitory computer-readable medium of claim 9, wherein capturing audio via the second set of microphones further comprises capturing the audio detected by the second set of microphones using the second signal processing scheme while concurrently capturing the audio detected by the first set of microphones using the first signal processing scheme.
  - 11. The tangible, non-transitory computer-readable medium of claim 9, wherein the method further comprises:
    - in response to enabling the second wake-word engine, electing the second voice assistant service to process voice input over the first voice assistant service.
  - 12. The tangible, non-transitory computer-readable medium of claim 8, wherein capturing audio via the second set of microphones comprises capturing audio via the second set of microphones while concurrently capturing the audio via the first set of microphones.
  - 13. The tangible, non-transitory computer-readable medium of claim 8, wherein the second set of microphones comprises fewer microphones than the first set of microphones.
  - 14. The tangible, non-transitory computer-readable medium of claim 8, wherein each microphone of the plurality of microphones is in one of the first set of microphones or the second set of microphones.

15. A method comprising:
- capturing audio via a first set of microphones selected from a plurality of microphones of a playback device;
  
  analyzing the audio captured via the first set of microphones using a first wake-word engine on the playback device to detect a first wake word;
  
  selecting a second wake-word engine on the playback device, wherein the second wake-word engine is different from the first wake-word engine;
  
  after selecting the second wake-word engine, capturing audio via a second set of microphones selected from the plurality of microphones, wherein the second set of microphones is different from the first set of microphones;
  
  analyzing the audio captured via the second set of microphones using the second wake-word engine to detect a second wake word;
  
  detecting a wake word via one of the first wake-word engine or the second wake-word engine, wherein the detected wake word comprises one of the first wake word or the second wake word; and
  
  transmitting, via a network interface of the playback device, at least a voice utterance following the detected wake word to one or more remote servers corresponding to a particular voice assistant service associated with the detected wake word.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15, wherein the first wake-word engine is associated with a first voice assistant service and the second wake-word engine is associated with a second voice assistant service, wherein capturing audio via the first set of microphones comprises capturing audio detected by the first set of microphones using a first signal processing scheme configured for the first voice assistant service, and wherein capturing audio via the second set of microphones comprises capturing audio detected by the second set of microphones using a second signal processing scheme configured for the second voice assistant service.
  - 17. The method of claim 16, wherein capturing audio via the second set of microphones further comprises capturing the audio detected by the second set of microphones using the second signal processing scheme while concurrently capturing the audio detected by the first set of microphones using the first signal processing scheme.
  - 18. The method of claim 16, wherein the method further comprises:
    - in response to selecting the second wake-word engine, electing the second voice assistant service to process voice input over the first voice assistant service.
  - 19. The method of claim 15, wherein capturing audio via the second set of microphones comprises capturing audio via the second set of microphones while concurrently capturing the audio via the first set of microphones.
  - 20. The method of claim 15, wherein the second set of microphones comprises fewer microphones than the first set of microphones.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sonos, Inc.
Original Assignee
Sonos, Inc.
Inventors
Smith, Connor Kristopher, Soto, Kurt Thomas, Sleith, Charles Conor
Primary Examiner(s)
Le, Thuykhanh

Application Number

US16/434,426
Time in Patent Office

263 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0208   Noise filtering

H04R 1/406   microphones

H04R 2227/001   Adaptation of signal proces...

H04R 2227/005   Audio distribution systems ...

H04R 2420/07   Applications of wireless lo...

H04R 27/00   Public address systems circ...

H04R 29/004   for microphones H04R29/007 ...

H04R 29/005   Microphone arrays

H04R 3/005   for combining the signals o...

H04S 7/305   Electronic adaptation of st...

Voice detection optimization based on selected voice assistant service

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

583 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Voice detection optimization based on selected voice assistant service

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

583 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others