Linear Filtering for Noise-Suppressed Speech Detection Via Multiple Network Microphone Devices

US 20200105295A1
Filed: 09/29/2018
Published: 04/02/2020
Est. Priority Date: 09/29/2018
Status: Active Grant

First Claim

Patent Images

1. A first network microphone device comprising:

a plurality of microphones comprising a first microphone and a second microphone;

one or more processors;

a network interface; and

tangible, non-transitory, computer-readable media storing instructions executable by the one or more processors to cause the first network microphone device to perform operations comprising;

receiving an instruction to process one or more audio signals captured by a second network microphone device;

after receiving the instruction, (i) functionally disabling at least the first microphone, (ii) capturing a first audio signal via the second microphone, and (iii) receiving over the network interface a second audio signal captured via at least a third microphone of the second network microphone device, wherein the first audio signal comprises first noise content from a noise source and the second audio signal comprises second noise content from the noise source;

identifying the first noise content in the first audio signal;

using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones;

using the estimated noise content to suppress the first noise content in the first audio signal and the second noise content in the second audio signal;

combining the suppressed first audio signal and the suppressed second audio signal into a third audio signal;

determining that the third audio signal includes a voice input comprising a wake word; and

in response to the determination, processing the voice input to identify a voice utterance different from the wake word.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for suppressing noise and detecting voice input in a multi-channel audio signal captured by two or more network microphone devices include receiving an instruction to process one or more audio signals captured by a first network microphone device and after receiving the instruction (i) disabling at least a first microphone of a plurality of microphones of a second network microphone device, (ii) capturing a first audio signal via a second microphone of the plurality of microphones, (iii) receiving over a network interface of the second network microphone device a second audio signal captured via at least a third microphone of the first network microphone device, (iv) using estimated noise content to suppress first and second noise content in the first and second audio signals, (v) combining the suppressed first and second audio signals into a third audio signal, and (vi) determining that the third audio signal includes a voice input comprising a wake word.

9 Citations

View as Search Results

20 Claims

1. A first network microphone device comprising:
- a plurality of microphones comprising a first microphone and a second microphone;
  
  one or more processors;
  
  a network interface; and
  
  tangible, non-transitory, computer-readable media storing instructions executable by the one or more processors to cause the first network microphone device to perform operations comprising;
  
  receiving an instruction to process one or more audio signals captured by a second network microphone device;
  
  after receiving the instruction, (i) functionally disabling at least the first microphone, (ii) capturing a first audio signal via the second microphone, and (iii) receiving over the network interface a second audio signal captured via at least a third microphone of the second network microphone device, wherein the first audio signal comprises first noise content from a noise source and the second audio signal comprises second noise content from the noise source;
  
  identifying the first noise content in the first audio signal;
  
  using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones;
  
  using the estimated noise content to suppress the first noise content in the first audio signal and the second noise content in the second audio signal;
  
  combining the suppressed first audio signal and the suppressed second audio signal into a third audio signal;
  
  determining that the third audio signal includes a voice input comprising a wake word; and
  
  in response to the determination, processing the voice input to identify a voice utterance different from the wake word.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The first network microphone device of claim 1, the operations further comprising:
    - determining a probability that the first audio signal comprises speech content,wherein the steps of (i) identifying the first noise content in the first audio signal and (ii) using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones are carried out based on the determined probability being below a threshold probability.
  - 3. The first network microphone device of claim 1, the operations further comprising:
    - receiving an instruction to cease processing of audio signals captured by the second network microphone device; and
      
      after receiving the instruction to cease processing of audio signals captured by the second network microphone device, (i) enabling at least the first microphone of the first network microphone device, (ii) capturing fourth audio content via the first microphone, (iii) capturing fifth audio content via the second microphone of the first network microphone device, and (iv) using the fourth and fifth audio signals to identify potential voice input in sound detected by the plurality of microphones.
  - 4. The first network microphone device of claim 1, wherein the first network microphone device captures the first audio signal at first time and the second network microphone device captures the second audio signal at a second time different than the first time.
  - 5. The first network microphone device of claim 1, the operations further comprising applying an offset time to at least one of the first audio signal and the second audio signal before combining the suppressed first audio signal and the suppressed second audio signal into the third audio signal.
  - 6. The first network microphone device of claim 1, the operations further comprising offsetting at least one of the first audio signal and the second audio signal based on a time differential between a device clock of the first network microphone device and a device clock of the second network microphone device.
  - 7. The first network microphone device of claim 1, wherein processing the voice input comprises transmitting at least a portion of the voice input to a remote computing device for voice processing to identify a voice utterance different from the wake word.

8. Tangible, non-transitory, computer-readable media storing instructions executable by one or more processors to cause a first network microphone device to perform operations comprising:
- receiving an instruction to process one or more audio signals captured by a second network microphone device;
  
  after receiving the instruction, (i) functionally disabling at least a first microphone of a plurality of microphones of the first network microphone device, (ii) capturing a first audio signal via a second microphone of the plurality of microphones, and (iii) receiving over a network interface of the first network microphone device a second audio signal captured via at least a third microphone of the second network microphone device, wherein the first audio signal comprises first noise content from a noise source and the second audio signal comprises second noise content from the noise source;
  
  identifying the first noise content in the first audio signal;
  
  using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones;
  
  using the estimated noise content to suppress the first noise content in the first audio signal and the second noise content in the second audio signal;
  
  combining the suppressed first audio signal and the suppressed second audio signal into a third audio signal;
  
  determining that the third audio signal includes a voice input comprising a wake word; and
  
  in response to the determination, processing the voice input to identify a voice utterance different from the wake word.
- View Dependent Claims (9, 10, 12, 14)
- - 9. The tangible, non-transitory, computer-readable media of claim 8, the operations further comprising:
    - determining a probability that the first audio signal comprises speech content,wherein the steps of (i) identifying the first noise content in the first audio signal and (ii) using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones are carried out based on the determined probability being below a threshold probability.
  - 10. The tangible, non-transitory, computer-readable media of claim 8, further comprising:
    - receiving an instruction to cease processing of audio signals captured by the second network microphone device; and
      
      after receiving the instruction to cease processing of audio signals captured by the second network microphone device, (i) enabling at least the first microphone of the first network microphone device, (ii) capturing fourth audio content via the first microphone, (iii) capturing fifth audio content via the second microphone of the first network microphone device, and (iv) using the fourth and fifth audio signals to identify potential voice input in sound detected by the plurality of microphones.
  - 12. The tangible, non-transitory, computer-readable media of claim 8, further comprising applying an offset time to at least one of the first audio signal and the second audio signal before combining the suppressed first audio signal and the suppressed second audio signal into the third audio signal.
  - 14. The tangible, non-transitory, computer-readable media of claim 8, wherein processing the voice input comprises transmitting at least a portion of the voice input to a remote computing device for voice processing to identify a voice utterance different from the wake word.

11. The tangible, non-transitory, computer-readable media of claim 11, wherein the first network microphone device captures the first audio signal at first time and the second network microphone device captures the second audio signal at a second time different than the first time.
- View Dependent Claims (13)
- - 13. The tangible, non-transitory, computer-readable media of claim 11, further comprising offsetting at least one of the first audio signal and the second audio signal based on a time differential between a device clock of the first network microphone device and a device clock of the second network microphone device.

15. A method comprising:
- receiving an instruction to process one or more audio signals captured by a first network microphone device;
  
  after receiving the instruction, (i) functionally disabling at least a first microphone of a plurality of microphones of a second network microphone device, (ii) capturing a first audio signal via a second microphone of the plurality of microphones, and (iii) receiving over a network interface of the second network microphone device a second audio signal captured via at least a third microphone of the first network microphone device, wherein the first audio signal comprises first noise content from a noise source and the second audio signal comprises second noise content from the noise source;
  
  identifying the first noise content in the first audio signal;
  
  using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones;
  
  using the estimated noise content to suppress the first noise content in the first audio signal and the second noise content in the second audio signal;
  
  combining the suppressed first audio signal and the suppressed second audio signal into a third audio signal;
  
  determining that the third audio signal includes a voice input comprising a wake word; and
  
  in response to the determination, processing the voice input to identify a voice utterance different from the wake word.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15, further comprising:
    - determining a probability that the first audio signal comprises speech content,wherein the steps of (i) identifying the first noise content in the first audio signal and (ii) using the identified first noise content to determine an estimated noise content captured by at least the second and third microphones are carried out based on the determined probability being below a threshold probability.
  - 17. The method of claim 15, further comprising:
    - after receiving the instruction to cease processing of audio signals captured by the first network microphone device, (i) enabling at least the first microphone of the second network microphone device, (ii) capturing fourth audio content via the first microphone, (iii) capturing fifth audio content via the second microphone of the second network microphone device, and (iv) using the fourth and fifth audio signals to identify potential voice input in sound detected by the plurality of microphones.
  - 18. The method of claim 15, wherein the first network microphone device captures the first audio signal at first time and the second network microphone device captures the second audio signal at a second time different than the first time.
  - 19. The method of claim 15, further comprising applying an offset time to at least one of the first audio signal and the second audio signal before combining the suppressed first audio signal and the suppressed second audio signal into the third audio signal.
  - 20. The method of claim 15, further comprising offsetting at least one of the first audio signal and the second audio signal based on a time differential between a device clock of the first network microphone device and a device clock of the second network microphone device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sonos, Inc.
Original Assignee
Sonos, Inc.
Inventors
Sereshki, Saeed Bagheri, Giacobello, Daniele

Granted Patent

US 10,692,518 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 2015/088   Word spotting

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0208   Noise filtering

G10L 21/0216   characterised by the method...

G10L 21/0232   Processing in the frequency...

G10L 25/84   for discriminating voice fr...

H04R 1/406   microphones

H04R 2201/405   Non-uniform arrays of trans...

H04R 2203/12   Beamforming aspects for ste...

H04R 3/005   for combining the signals o...

H04R 5/04   Circuit arrangements, e.g. ...

Linear Filtering for Noise-Suppressed Speech Detection Via Multiple Network Microphone Devices

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

9 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Linear Filtering for Noise-Suppressed Speech Detection Via Multiple Network Microphone Devices

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links