Voice detection optimization based on selected voice assistant service
First Claim
1. A playback device comprising:
- a plurality of microphones;
a network interface;
one or more processors; and
tangible, non-transitory computer-readable media having stored therein instructions executable by the one or more processors to cause the playback device to perform a method comprising;
capturing audio via a first set of microphones selected from the plurality of microphones;
analyzing the audio captured via the first set of microphones using a first wake-word engine on the playback device to detect a first wake word;
selecting a second wake-word engine on the playback device, wherein the second wake-word engine is different from the first wake-word engine;
after selecting the second wake-word engine, capturing audio via a second set of microphones selected from the plurality of microphones, wherein the second set of microphones is different from the first set of microphones;
analyzing the audio captured via the second set of microphones using the second wake-word engine to detect a second wake word;
detecting a wake word via one of the first wake-word engine or the second wake-word engine, wherein the detected wake word comprises one of the first wake word or the second wake word; and
transmitting, via the network interface, at least a voice utterance following the detected wake word to one or more remote servers corresponding to a particular voice assistant service associated with the detected wake word.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for optimizing voice detection via a network microphone device (NMD) based on a selected voice-assistant service (VAS) are disclosed herein. In one example, the NMD detects sound via individual microphones and selects a first VAS to communicate with the NMD. The NMD produces a first sound-data stream based on the detected sound using a spatial processor in a first configuration. Once the NMD determines that a second VAS is to be selected over the first VAS, the spatial processor assumes a second configuration for producing a second sound-data stream based on the detected sound. The second sound-data stream is then transmitted to one or more remote computing devices associated with the second VAS.
583 Citations
20 Claims
-
1. A playback device comprising:
-
a plurality of microphones; a network interface; one or more processors; and tangible, non-transitory computer-readable media having stored therein instructions executable by the one or more processors to cause the playback device to perform a method comprising; capturing audio via a first set of microphones selected from the plurality of microphones;
analyzing the audio captured via the first set of microphones using a first wake-word engine on the playback device to detect a first wake word;selecting a second wake-word engine on the playback device, wherein the second wake-word engine is different from the first wake-word engine; after selecting the second wake-word engine, capturing audio via a second set of microphones selected from the plurality of microphones, wherein the second set of microphones is different from the first set of microphones; analyzing the audio captured via the second set of microphones using the second wake-word engine to detect a second wake word; detecting a wake word via one of the first wake-word engine or the second wake-word engine, wherein the detected wake word comprises one of the first wake word or the second wake word; and transmitting, via the network interface, at least a voice utterance following the detected wake word to one or more remote servers corresponding to a particular voice assistant service associated with the detected wake word. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A tangible, non-transitory computer-readable medium having stored therein instructions executable by one or more processors to cause a playback device to perform a method comprising:
-
capturing audio via a first set of microphones selected from a plurality of microphones of the playback device; analyzing the audio captured via the first set of microphones using a first wake-word engine on the playback device to detect a first wake word; selecting a second wake-word engine on the playback device, wherein the second wake-word engine is different from the first wake-word engine; after selecting the second wake-word engine, capturing audio via a second set of microphones selected from the plurality of microphones, wherein the second set of microphones is different from the first set of microphones; analyzing the audio captured via the second set of microphones using the second wake-word engine to detect a second wake word; detecting a wake word via one of the first wake-word engine or the second wake-word engine, wherein the detected wake word comprises one of the first wake word or the second wake word; and transmitting, via a network interface of the playback device, at least a voice utterance following the detected wake word to one or more remote servers corresponding to a particular voice assistant service associated with the detected wake word. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method comprising:
-
capturing audio via a first set of microphones selected from a plurality of microphones of a playback device; analyzing the audio captured via the first set of microphones using a first wake-word engine on the playback device to detect a first wake word; selecting a second wake-word engine on the playback device, wherein the second wake-word engine is different from the first wake-word engine; after selecting the second wake-word engine, capturing audio via a second set of microphones selected from the plurality of microphones, wherein the second set of microphones is different from the first set of microphones; analyzing the audio captured via the second set of microphones using the second wake-word engine to detect a second wake word; detecting a wake word via one of the first wake-word engine or the second wake-word engine, wherein the detected wake word comprises one of the first wake word or the second wake word; and transmitting, via a network interface of the playback device, at least a voice utterance following the detected wake word to one or more remote servers corresponding to a particular voice assistant service associated with the detected wake word. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification