Audio Playback Settings for Voice Interaction
First Claim
1. A playback device comprising:
- a network interface;
one or more microphones;
an audio stage comprising an amplifier;
one or more speakers;
one or more processors;
a housing, the housing carrying at least the network interface, the one or more microphones, the audio stage, the one or more speakers, the one or more processors, and a computer-readable media having stored therein instructions executable by the one or more processors to cause the playback device to perform operations comprising;
while playing back first audio in a given environment at a given loudness via the audio stage and the one or more speakers;
(a) capturing, via the one or more microphones, a voice input(b) determining that the captured voice input includes audio data representing a wake word to invoke a voice assistant service;
(c) in response to determining that the captured voice input includes audio data representing the wake word to invoke the voice assistant service;
(i) sending, via the network interface to one or more servers of the voice assistant service, the voice input and (ii) determining a loudness of background noise in the given environment, wherein the background noise comprises ambient noise in the given environment;
(d) after determining the loudness of background noise, receiving, via the network interface from the one or more servers of the voice assistant service in response to the voice input, second audio data representing a spoken response to the voice inputin response to receiving the second audio data representing the spoken response to the voice input, ducking the first audio in proportion to a difference between the given loudness of the first audio and the determined loudness of the background noise; and
playing back the ducked first audio concurrently with the second audio representing the spoken response to the voice input via the audio stage and the one or more speakers.
4 Assignments
0 Petitions
Accused Products
Abstract
Example techniques relate to voice interaction in an environment with a media playback system that is playing back audio content. In an example implementation, while playing back first audio in a given environment at a given loudness: a playback device (a) detects that an event is anticipated in the given environment, the event involving playback of second audio and (b) determines a loudness of background noise in the given environment, the background noise comprising ambient noise in the given environment. The playback device ducks the first audio in proportion to a difference between the given loudness of the first audio and the determined loudness of the background noise and plays back the ducked first audio concurrently with the second audio.
-
Citations
26 Claims
-
1. A playback device comprising:
-
a network interface; one or more microphones; an audio stage comprising an amplifier; one or more speakers; one or more processors; a housing, the housing carrying at least the network interface, the one or more microphones, the audio stage, the one or more speakers, the one or more processors, and a computer-readable media having stored therein instructions executable by the one or more processors to cause the playback device to perform operations comprising; while playing back first audio in a given environment at a given loudness via the audio stage and the one or more speakers; (a) capturing, via the one or more microphones, a voice input (b) determining that the captured voice input includes audio data representing a wake word to invoke a voice assistant service; (c) in response to determining that the captured voice input includes audio data representing the wake word to invoke the voice assistant service;
(i) sending, via the network interface to one or more servers of the voice assistant service, the voice input and (ii) determining a loudness of background noise in the given environment, wherein the background noise comprises ambient noise in the given environment;(d) after determining the loudness of background noise, receiving, via the network interface from the one or more servers of the voice assistant service in response to the voice input, second audio data representing a spoken response to the voice input in response to receiving the second audio data representing the spoken response to the voice input, ducking the first audio in proportion to a difference between the given loudness of the first audio and the determined loudness of the background noise; and playing back the ducked first audio concurrently with the second audio representing the spoken response to the voice input via the audio stage and the one or more speakers. - View Dependent Claims (2, 6, 7, 8, 9, 10, 11, 12, 19)
-
-
3. (canceled)
-
4. (canceled)
-
5. (canceled)
-
13. A tangible, non-transitory computer-readable medium having stored therein instructions executable by one or more processors to cause a playback device to perform a method, the playback device comprising a housing carrying at least a network interface, one or more microphones, an audio stage, one or more speakers, and the one or more processors, and the method comprising:
-
while playing back first audio in a given environment at a given loudness via the audio stage and the one or more speakers; (a) capturing, via the one or more microphones, a voice input; (b) determining that the captured voice input includes audio data representing a wake word to invoke a voice assistant service; (c) in response to determining that the captured voice input includes audio data representing the wake word to invoke the voice assistant service;
(i) sending, via the network interface to one or more servers of the voice assistant service, the voice input and (ii) determining a loudness of background noise in the given environment, wherein the background noise comprises ambient noise in the given environment;(d) after determining the loudness of background noise, receiving, via the network interface from the one or more servers of the voice assistant service in response to the voice input, second audio data representing a spoken response to the voice input; in response to receiving the second audio data representing the spoken response to the voice input, ducking the first audio in proportion to a difference between the given loudness of the first audio and the determined loudness of the background noise; and playing back the ducked first audio concurrently with the second audio representing the spoken response to the voice input via the audio stage and the one or more speakers. - View Dependent Claims (14, 18, 24, 25, 26)
-
-
15. (canceled)
-
16. (canceled)
-
17. (canceled)
-
20. A method to be performed by a playback device comprising a housing carrying at least a network interface, one or more microphones, an audio stage, one or more speakers, the method comprising:
-
while playing back first audio in a given environment at a given loudness via the audio stage and the one or more speakers, the playback device; (a) capturing, via the one or more microphones, a voice input; (b) determining that the captured voice input includes audio data representing a wake word to invoke a voice assistant service; (c) in response to determining that the captured voice input includes audio data representing the wake word to invoke the voice assistant service;
(i) sending, via the network interface to one or more servers of the voice assistant service, the voice input and (ii) determining a loudness of background noise in the given environment, wherein the background noise comprises ambient noise in the given environment;(d) after determining the loudness of background noise, receiving, via the network interface from the one or more servers of the voice assistant service in response to the voice input, second audio data representing a spoken response to the voice input; in response to receiving the second audio data representing the spoken response to the voice input, the playback device ducking the first audio in proportion to a difference between the given loudness of the first audio and the determined loudness of the background noise; and the playback device playing back the ducked first audio concurrently with the second audio representing the spoken response to the voice input via the audio stage and the one or more speakers. - View Dependent Claims (21, 22, 23)
-
Specification