Directional capture of audio based on voice-activity detection

US 10,510,362 B2
Filed: 03/31/2017
Issued: 12/17/2019
Est. Priority Date: 03/31/2017
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving information representing audio captured by a microphone array,responsive to receiving the information, generating by a first beamformer, a first set of multiple directional audio signals each corresponding to a specific emphasized direction with respect to the microphone array;

computing, using one or more processing devices for each of the multiple directional audio signals, one or more quantities indicative of human voice activity captured from the corresponding direction;

determining, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction; and

generating, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, an additional directional audio signal distinct from the first set of multiple directional audio signals,the additional directional audio signal being generated by a second beamformer that emphasizes capture of human voice activity from the first direction as compared to audio captured from the second direction, wherein the second beamformer is a dynamic beamformer that operates, at least in part, based on an input signal received from the first beamformer.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The technology described in this document can be embodied in a computer-implemented method that includes receiving information representing audio captured by a microphone array, wherein the information includes multiple datasets each representing audio signals captured in accordance with a sensitivity pattern along a corresponding direction with respect to the microphone array. The method also includes computing, using one or more processing devices for each of the multiple datasets, one or more quantities indicative of human voice activity captured from the corresponding direction, and generating, based at least on the one or more quantities computed for a plurality of the multiple datasets, a directional audio signal representing audio captured from a particular direction.

Citations

21 Claims

1. A method comprising:
- receiving information representing audio captured by a microphone array,responsive to receiving the information, generating by a first beamformer, a first set of multiple directional audio signals each corresponding to a specific emphasized direction with respect to the microphone array;
  
  computing, using one or more processing devices for each of the multiple directional audio signals, one or more quantities indicative of human voice activity captured from the corresponding direction;
  
  determining, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction; and
  
  generating, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, an additional directional audio signal distinct from the first set of multiple directional audio signals,the additional directional audio signal being generated by a second beamformer that emphasizes capture of human voice activity from the first direction as compared to audio captured from the second direction, wherein the second beamformer is a dynamic beamformer that operates, at least in part, based on an input signal received from the first beamformer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the first beamformer is configured to process signals captured by the microphone array.
  - 3. The method of claim 2, wherein each of the multiple directional audio signals corresponds to a beam generated by the first beamformer.
  - 4. The method of claim 2, wherein the first beamformer is one of:
    - a fixed beamformer or a dynamic beamformer.
  - 5. The method of claim 1, wherein the one or more quantities indicative of human voice activity comprise a likelihood score of human voice activity in the directional audio signal for the corresponding emphasized direction.
  - 6. The method of claim 1, wherein the one or more quantities indicative of human voice activity comprise a signal-to-noise ratio (SNR).
  - 7. The method of claim 6, wherein the SNR is computed as a ratio of a first quantity representing a voice signal and a second quantity representing non-voice signals.
  - 8. The method of claim 1, wherein the one or more quantities indicative of human voice activity represents a likelihood score of the presence of a keyword in the directional audio signal for the corresponding emphasized direction.
  - 9. The method of claim 1, wherein the amount of human voice activity captured from the first direction is an amount of human voice activity corresponding to a particular speaker captured from the first direction, andwherein the amount of human voice activity captured from the second direction is an amount of human voice activity corresponding to the particular speaker captured from the second direction.

10. An apparatus comprising:
- a microphone array;
  
  one or more acoustic transducers configured to generate audio signals; and
  
  an audio processing engine including memory and one or more processing devices configured to;
  
  receive information representing the audio captured by the microphone array,responsive to receiving the information, generate by a first beamformer, a first set of multiple directional audio signals each corresponding to a specific emphasized direction with respect to the microphone array,compute, for each of the multiple directional audio signals, one or more quantities indicative of human voice activity captured from the corresponding direction,determine, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction, andgenerate, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, an additional directional audio signal distinct from the first set of multiple directional audio signals,the additional directional audio signal being generated by a second beamformer that emphasizes capture of human voice activity from the first direction as compared to audio captured from the second direction, wherein the second beamformer is a dynamic beamformer that operates, at least in part, based on an input signal received from the first beamformer.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The apparatus of claim 10, wherein the first beamformer is configured to process signals captured by the microphone array.
  - 12. The apparatus of claim 11, wherein each of the multiple directional audio signals corresponds to a beam generated by the first beamformer.
  - 13. The apparatus of claim 11, wherein the first beamformer is one of:
    - a fixed beamformer or a dynamic beamformer.
  - 14. The apparatus of claim 10, wherein the one or more quantities indicative of human voice activity comprise a likelihood score of human voice activity in the directional audio signal for the corresponding emphasized direction.
  - 15. The apparatus of claim 10, wherein the one or more quantities indicative of human voice activity comprise a signal-to-noise ratio (SNR).
  - 16. The apparatus of claim 15, wherein the SNR is computed as a ratio of a first quantity representing a voice signal and a second quantity representing non-voice signals.
  - 17. The apparatus of claim 10, wherein the one or more quantities indicative of human voice activity represents a likelihood score of the presence of a keyword in the directional audio signal for the corresponding emphasized direction.
  - 18. The apparatus of claim 10, wherein the amount of human voice activity captured from the first direction is an amount of human voice activity corresponding to a particular speaker captured from the first direction, andwherein the amount of human voice activity captured from the second direction is an amount of human voice activity corresponding to the particular speaker captured from the second direction.

19. One or more machine-readable storage devices having encoded thereon computer readable instructions for causing one or more processing devices to perform operations comprising:
- receiving information representing audio captured by a microphone array,responsive to receiving the information, generating by a first beamformer, a first set of multiple directional audio signals each corresponding to a specific emphasized direction with respect to the microphone array;
  
  computing, for each of the multiple directional audio signals, one or more quantities indicative of human voice activity captured from the corresponding direction;
  
  determining, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction; and
  
  generating, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, an additional directional audio signal distinct from the first set of multiple directional audio signals,the additional directional audio signal being generated by a second beamformer that emphasizes capture of human voice activity from the first direction as compared to audio captured from the second direction, wherein the second beamformer is a dynamic beamformer that operates, at least in part, based on an input signal received from the first beamformer.
- View Dependent Claims (20, 21)
- - 20. The one or more machine-readable storage devices of claim 19, wherein the amount of human voice activity captured from the first direction is an amount of human voice activity corresponding to a particular speaker captured from the first direction, andwherein the amount of human voice activity captured from the second direction is an amount of human voice activity corresponding to the particular speaker captured from the second direction.
  - 21. The one or more machine-readable storage devices of claim 19, wherein the one or more quantities indicative of human voice activity represents a likelihood score of the presence of a keyword in the directional audio signal for the corresponding emphasized direction.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bose Corporation
Original Assignee
Bose Corporation
Inventors
Hicks, Matthew Ryan, Crist, David Rolland, Moghimi, Amir Reza
Primary Examiner(s)
Azad, Abul K

Application Number

US15/475,191
Publication Number

US 20180286433A1
Time in Patent Office

991 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 2015/088   Word spotting

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0232   Processing in the frequency...

G10L 25/78   Detection of presence or ab...

G10L 25/84   for discriminating voice fr...

H04R 1/406   microphones

H04R 2203/12   Beamforming aspects for ste...

H04R 2430/23   Direction finding using a s...

H04R 3/005   for combining the signals o...

Directional capture of audio based on voice-activity detection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Directional capture of audio based on voice-activity detection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links