Linear Filtering for Noise-Suppressed Speech Detection Via Multiple Network Microphone Devices
First Claim
1. A first network microphone device comprising:
- a plurality of microphones comprising a first microphone and a second microphone;
one or more processors;
a network interface; and
tangible, non-transitory, computer-readable media storing instructions executable by the one or more processors to cause the first network microphone device to perform operations comprising;
receiving an instruction to process one or more audio signals captured by a second network microphone device;
after receiving the instruction, (i) functionally disabling at least the first microphone, (ii) capturing a first audio signal via the second microphone, and (iii) receiving over the network interface a second audio signal captured via at least a third microphone of the second network microphone device, wherein the first audio signal comprises first noise content from a noise source and the second audio signal comprises second noise content from the noise source;
identifying the first noise content in the first audio signal;
using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones;
using the estimated noise content to suppress the first noise content in the first audio signal and the second noise content in the second audio signal;
combining the suppressed first audio signal and the suppressed second audio signal into a third audio signal;
determining that the third audio signal includes a voice input comprising a wake word; and
in response to the determination, processing the voice input to identify a voice utterance different from the wake word.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for suppressing noise and detecting voice input in a multi-channel audio signal captured by two or more network microphone devices include receiving an instruction to process one or more audio signals captured by a first network microphone device and after receiving the instruction (i) disabling at least a first microphone of a plurality of microphones of a second network microphone device, (ii) capturing a first audio signal via a second microphone of the plurality of microphones, (iii) receiving over a network interface of the second network microphone device a second audio signal captured via at least a third microphone of the first network microphone device, (iv) using estimated noise content to suppress first and second noise content in the first and second audio signals, (v) combining the suppressed first and second audio signals into a third audio signal, and (vi) determining that the third audio signal includes a voice input comprising a wake word.
9 Citations
20 Claims
-
1. A first network microphone device comprising:
-
a plurality of microphones comprising a first microphone and a second microphone; one or more processors; a network interface; and tangible, non-transitory, computer-readable media storing instructions executable by the one or more processors to cause the first network microphone device to perform operations comprising; receiving an instruction to process one or more audio signals captured by a second network microphone device; after receiving the instruction, (i) functionally disabling at least the first microphone, (ii) capturing a first audio signal via the second microphone, and (iii) receiving over the network interface a second audio signal captured via at least a third microphone of the second network microphone device, wherein the first audio signal comprises first noise content from a noise source and the second audio signal comprises second noise content from the noise source; identifying the first noise content in the first audio signal; using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones; using the estimated noise content to suppress the first noise content in the first audio signal and the second noise content in the second audio signal; combining the suppressed first audio signal and the suppressed second audio signal into a third audio signal; determining that the third audio signal includes a voice input comprising a wake word; and in response to the determination, processing the voice input to identify a voice utterance different from the wake word. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. Tangible, non-transitory, computer-readable media storing instructions executable by one or more processors to cause a first network microphone device to perform operations comprising:
-
receiving an instruction to process one or more audio signals captured by a second network microphone device; after receiving the instruction, (i) functionally disabling at least a first microphone of a plurality of microphones of the first network microphone device, (ii) capturing a first audio signal via a second microphone of the plurality of microphones, and (iii) receiving over a network interface of the first network microphone device a second audio signal captured via at least a third microphone of the second network microphone device, wherein the first audio signal comprises first noise content from a noise source and the second audio signal comprises second noise content from the noise source; identifying the first noise content in the first audio signal; using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones; using the estimated noise content to suppress the first noise content in the first audio signal and the second noise content in the second audio signal; combining the suppressed first audio signal and the suppressed second audio signal into a third audio signal; determining that the third audio signal includes a voice input comprising a wake word; and in response to the determination, processing the voice input to identify a voice utterance different from the wake word. - View Dependent Claims (9, 10, 12, 14)
-
- 11. The tangible, non-transitory, computer-readable media of claim 11, wherein the first network microphone device captures the first audio signal at first time and the second network microphone device captures the second audio signal at a second time different than the first time.
-
15. A method comprising:
-
receiving an instruction to process one or more audio signals captured by a first network microphone device; after receiving the instruction, (i) functionally disabling at least a first microphone of a plurality of microphones of a second network microphone device, (ii) capturing a first audio signal via a second microphone of the plurality of microphones, and (iii) receiving over a network interface of the second network microphone device a second audio signal captured via at least a third microphone of the first network microphone device, wherein the first audio signal comprises first noise content from a noise source and the second audio signal comprises second noise content from the noise source; identifying the first noise content in the first audio signal; using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones; using the estimated noise content to suppress the first noise content in the first audio signal and the second noise content in the second audio signal; combining the suppressed first audio signal and the suppressed second audio signal into a third audio signal; determining that the third audio signal includes a voice input comprising a wake word; and in response to the determination, processing the voice input to identify a voice utterance different from the wake word. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification