Three-dimensional beam forming with a microphone array

US 10,051,366 B1
Filed: 09/28/2017
Issued: 08/14/2018
Est. Priority Date: 09/28/2017
Status: Active Grant

First Claim

Patent Images

1. A network device comprising:

one or more processors;

a microphone array; and

tangible, non-transitory computer-readable media comprising instructions encoded therein, wherein the instructions, when executed by the one or more processors, cause the network device to perform a method comprising;

generating a set of received-sound beams by applying a plurality of sets of beamforming coefficients stored in the tangible, non-transitory computer-readable memory to sound received via the microphone array, wherein each received-sound beam corresponds to a separate direction relative to the microphone array;

identifying a subset of the received-sound beams comprising speech content;

for each received-sound beam in the subset of the received-sound beams comprising speech content, determining whether the speech content comprises a wake word;

selecting one final received-sound beam from the received-sound beams in the subset of the received-sound beams determined to comprise a wake word; and

causing the selected one final received-sound beam to be processed to identify a voice command.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for three-dimensional beamforming disclosed herein include, among other features (i) generating a set of received-sound beams by applying a plurality of sets of beamforming coefficients stored in a tangible memory of the network device to sound received via a microphone array of the network device, wherein each received-sound beam corresponds to a separate direction relative to the microphone array, (ii) identifying a subset of the received-sound beams comprising speech content, (iii) for each received-sound beam in the subset of the received-sound beams comprising speech content, determining whether the speech content comprises a wake word, (iv) selecting one final received-sound beam from the received-sound beams in the subset of the received-sound beams determined to comprise a wake word; and (v) causing the selected one final received-sound beam to be processed to identify a voice command.

Citations

20 Claims

1. A network device comprising:
- one or more processors;
  
  a microphone array; and
  
  tangible, non-transitory computer-readable media comprising instructions encoded therein, wherein the instructions, when executed by the one or more processors, cause the network device to perform a method comprising;
  
  generating a set of received-sound beams by applying a plurality of sets of beamforming coefficients stored in the tangible, non-transitory computer-readable memory to sound received via the microphone array, wherein each received-sound beam corresponds to a separate direction relative to the microphone array;
  
  identifying a subset of the received-sound beams comprising speech content;
  
  for each received-sound beam in the subset of the received-sound beams comprising speech content, determining whether the speech content comprises a wake word;
  
  selecting one final received-sound beam from the received-sound beams in the subset of the received-sound beams determined to comprise a wake word; and
  
  causing the selected one final received-sound beam to be processed to identify a voice command.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The network device of claim 1, wherein generating the set of received-sound beams by applying the plurality of sets of beamforming coefficients stored in the tangible, non-transitory computer-readable memory to sound received via the microphone array comprises:
    - generating a plurality of sound signals via the microphone array, wherein each microphone of the microphone array generates a separate sound signal;
      
      providing the plurality of sound signals to each beamformer of a set of beamformers, wherein each beamformer in the set of beamformers corresponds to one of the separate directions relative to the microphone array; and
      
      at each beamformer, applying the beamformer'"'"'s coefficients to each sound signal of the plurality of sound signals to generate a received-sound beam corresponding to the separate direction relative to the microphone array corresponding to the beamformer.
  - 3. The network device of claim 1, wherein generating the set of received-sound beams by applying the plurality of sets of beamforming coefficients stored in the tangible, non-transitory computer-readable memory to sound received via the microphone array comprises generating between about 15 to 20 separate received-sound beams.
  - 4. The network device of claim 1, wherein at least one set of the plurality of sets of beamforming coefficients stored in the tangible, non-transitory computer-readable memory comprises a static set of beamforming coefficients generated by a procedure comprising:
    - for a reference microphone array positioned in an anechoic chamber, generating a first set of beamforming coefficients based on a first plurality of measurements of a corresponding first plurality of sounds, wherein each sound of the corresponding first plurality of sounds originates from a separate direction in the anechoic chamber relative to the reference microphone array positioned in the anechoic chamber.
  - 5. The network device of claim 4, wherein the procedure for generating at least one set of the plurality of sets of beamforming coefficients stored in the tangible, non-transitory computer-readable memory further comprises, while the network device is positioned outside of the anechoic chamber:
    - generating a second set of beamforming coefficients based on a second plurality of measurements of a corresponding second plurality of sounds, wherein each sound of the corresponding second plurality of sounds originates from a separate direction outside of the anechoic chamber relative to the network device positioned outside of the anechoic chamber; and
      
      generating the at least one set of the plurality of sets of beamforming coefficients based on the first set of beamforming coefficients and the second set of beamforming coefficients.
  - 6. The network device of claim 5, wherein generating the at least one set of the plurality of sets of beamforming coefficients based on the first set of beamforming coefficients and the second set of beamforming coefficients comprises:
    - determining an offset value between at least one beamforming coefficient of the first set of beamforming coefficients and at least one beamforming coefficient of the second set of beamforming coefficients; and
      
      adjusting at least one beamforming coefficient of the first set of beamforming coefficients based at least in part on the determined offset.
  - 7. The network device of claim 1, wherein identifying the subset of the received-sound beams comprising speech content comprises:
    - calculating a correlation metric for each received-sound beam relative to every other received-sound beam of the set of received-sound beams;
      
      storing each received-sound beam'"'"'s correlation metric relative to every other received-sound beam in a cross-correlation matrix; and
      
      selecting for further processing, a subset of received-sound beams having a high likelihood of speech content and a low correlation with one another based on a combination of the received-sound beam'"'"'s beamforming coefficients and the cross-correlation matrix.
  - 8. The network device of claim 7, wherein selecting for further processing, the subset of received-sound beams having a high likelihood of speech content and a low correlation with one another based on the combination of the received-sound beam'"'"'s beamforming coefficients and the cross-correlation matrix comprises:
    - ranking each received-sound beam based on its likelihood of comprising speech content; and
      
      based on the ranking, selecting a top subset of two or more least-correlated received-sound beams for further processing.
  - 9. The network device of claim 8, wherein selecting the top subset of two or more least-correlated received-sound beams for further processing comprises selecting between about 3 to 5 received-sound beams for further processing.
  - 10. The network device of claim 1, wherein determining whether the speech content comprises a wake word comprises:
    - ranking a top subset of two or more least-correlated received-sound beams based on, for each received-sound beam in the top subset of two or more least-correlated received-sound beams, a likelihood that speech content of the received-sound beam comprises the wake word, wherein the selected one final received-sound beam is the received-sound beam having speech content with the highest likelihood of comprising the wake word.
  - 11. The network device of claim 1, wherein the method further comprises:
    - mixing an inverse of a received-sound beam orthogonal to the selected one final received-sound beam with the selected one final received-sound beam.
  - 12. The network device of claim 1, wherein the microphone array comprises a planar microphone array comprising two or more microphones arranged on a substantially two-dimensional plane.
  - 13. The network device of claim 1, wherein each set of the plurality of sets of beamforming coefficients stored in the tangible, non-transitory computer-readable memory comprises a static set of beamforming coefficients.
  - 14. The network device of claim 1, wherein for each received-sound beam in the subset of the received-sound beams comprising speech content, determining whether the speech content comprises a wake word comprises transmitting a speech sample of the received-sound beam to a separate computing system for voice analysis.
  - 15. The network device of claim 1, wherein causing the selected one final received-sound beam to be processed to identify a voice command comprises transmitting a speech sample of the selected one final received-sound beam to a separate computing system for voice analysis.
  - 16. The network device of claim 1, wherein the method further comprises:
    - mixing an inverse of a received-sound beam orthogonal to the selected one final received-sound beam with the selected one final received-sound beam.

17. Tangible, non-transitory computer-readable media comprising instructions encoded therein, wherein the instructions, when executed by one or more processors, cause a network device to perform a method comprising:
- generating a set of received-sound beams by applying a plurality of sets of beamforming coefficients stored in a tangible memory of the network device to sound received via a microphone array of the network device, wherein each received-sound beam corresponds to a separate direction relative to the microphone array;
  
  identifying a subset of the received-sound beams comprising speech content;
  
  for each received-sound beam in the subset of the received-sound beams comprising speech content, determining whether the speech content comprises a wake word;
  
  selecting one final received-sound beam from the received-sound beams in the subset of the received-sound beams determined to comprise a wake word; and
  
  causing the selected one final received-sound beam to be processed to identify a voice command.
- View Dependent Claims (18, 19, 20)
- - 18. The tangible, non-transitory computer-readable media of claim 17, wherein generating the set of received-sound beams by applying the plurality of sets of beamforming coefficients stored in the tangible, non-transitory computer-readable memory to sound received via the microphone array comprises:
    - generating a plurality of sound signals via the microphone array, wherein each microphone of the microphone array generates a separate sound signal;
      
      providing the plurality of sound signals to each beamformer of a set of beamformers, wherein each beamformer in the set of beamformers corresponds to one of the separate directions relative to the microphone array; and
      
      at each beamformer, applying the beamformer'"'"'s coefficients to each sound signal of the plurality of sound signals to generate a received-sound beam corresponding to the separate direction relative to the microphone array corresponding to the beamformer.
  - 19. The tangible, non-transitory computer-readable media of claim 17, wherein identifying the subset of the received-sound beams comprising speech content comprises:
    - calculating a correlation metric for each received-sound beam relative to every other received-sound beam of the set of received-sound beams;
      
      storing each received-sound beam'"'"'s correlation metric relative to every other received-sound beam in a cross-correlation matrix; and
      
      selecting for further processing, a subset of received-sound beams having a high likelihood of speech content and a low correlation with one another based on a combination of the received-sound beam'"'"'s beamforming coefficients and the cross-correlation matrix.
  - 20. The tangible, non-transitory computer-readable media of claim 19, wherein selecting for further processing, the subset of received-sound beams having a high likelihood of speech content and a low correlation with one another based on the combination of the received-sound beam'"'"'s beamforming coefficients and the cross-correlation matrix comprises:
    - ranking each received-sound beam based on its likelihood of comprising speech content; and
      
      based on the ranking, selecting a top subset of two or more least-correlated received-sound beams for further processing.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sonos, Inc.
Original Assignee
Sonos, Inc.
Inventors
Buoni, Matthew, Kadri, Romi, Oishi, Tetsuro
Primary Examiner(s)
Nguyen, Khai N

Application Number

US15/719,454
Time in Patent Office

320 Days
Field of Search

381 92
US Class Current
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 15/28   Constructional details of s...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 25/78   Detection of presence or ab...

H04R 1/406   microphones

H04R 2227/003   Digital PA systems using, e...

H04R 2227/005   Audio distribution systems ...

H04R 2430/21   Direction finding using dif...

H04R 29/005   Microphone arrays

H04R 3/005   for combining the signals o...

Three-dimensional beam forming with a microphone array

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Three-dimensional beam forming with a microphone array

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links