DIRECTIONAL CAPTURE OF AUDIO BASED ON VOICE-ACTIVITY DETECTION

US 20180286433A1
Filed: 03/31/2017
Published: 10/04/2018
Est. Priority Date: 03/31/2017
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving information representing audio captured by a microphone array, wherein the information comprises multiple datasets each representing audio signals captured in accordance with a sensitivity pattern along a corresponding direction with respect to the microphone array;

computing, using one or more processing devices for each of the multiple datasets, one or more quantities indicative of human voice activity captured from the corresponding direction;

determining, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction; and

generating, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, a directional audio signal in which audio captured from the first direction is emphasized as compared to audio captured from the second direction.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The technology described in this document can be embodied in a computer-implemented method that includes receiving information representing audio captured by a microphone array, wherein the information includes multiple datasets each representing audio signals captured in accordance with a sensitivity pattern along a corresponding direction with respect to the microphone array. The method also includes computing, using one or more processing devices for each of the multiple datasets, one or more quantities indicative of human voice activity captured from the corresponding direction, and generating, based at least on the one or more quantities computed for a plurality of the multiple datasets, a directional audio signal representing audio captured from a particular direction.

29 Citations

View as Search Results

21 Claims

1. A method comprising:
- receiving information representing audio captured by a microphone array, wherein the information comprises multiple datasets each representing audio signals captured in accordance with a sensitivity pattern along a corresponding direction with respect to the microphone array;
  
  computing, using one or more processing devices for each of the multiple datasets, one or more quantities indicative of human voice activity captured from the corresponding direction;
  
  determining, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction; and
  
  generating, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, a directional audio signal in which audio captured from the first direction is emphasized as compared to audio captured from the second direction.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the information representing the audio captured by the microphone array is received from a beamformer configured to process signals captured using the microphone array.
  - 3. The method of claim 2, wherein each of the multiple datasets corresponds to a beam generated using the beamformer.
  - 4. The method of claim 2, wherein the beamformer is one of:
    - a fixed beamformer or a dynamic beamformer.
  - 5. The method of claim 1, wherein the one or more quantities indicative of human voice activity comprise a likelihood score of human voice activity in the audio signal represented in the dataset for the corresponding direction.
  - 6. The method of claim 1, wherein the one or more quantities indicative of human voice activity comprise a signal-to-noise ratio (SNR).
  - 7. The method of claim 6, wherein the SNR is computed as a ratio of a first quantity representing a voice signal and a second quantity representing non-voice signals.
  - 8. The method of claim 1, wherein the one or more quantities indicative of human voice activity represents a likelihood score of the presence of a keyword in the audio signal represented in the dataset for the corresponding direction.
  - 9. The method of claim 1, wherein generating the directional audio signal comprises selecting one of the multiple datasets.
  - 10. The method of claim 1, wherein generating the directional audio signal comprises causing a dynamic beamformer to capture audio in accordance with a sensitivity pattern generated for the particular direction.

11. An apparatus comprising:
- a microphone array;
  
  one or more acoustic transducers configured to generate audio signals; and
  
  an audio processing engine including memory and one or more processing devices configured to;
  
  receive information representing the audio captured by the microphone array, wherein the information comprises multiple datasets each representing audio signals captured in accordance with a sensitivity pattern along a corresponding direction with respect to the microphone array,compute, for each of the multiple datasets, one or more quantities indicative of human voice activity captured from the corresponding direction, anddetermine, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction, andgenerate, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, a directional audio signal in which audio captured from the first direction is emphasized as compared to audio captured from the second direction.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The apparatus of claim 11, further comprising a beamformer configured to generate the information by processing signals captured using the microphone array.
  - 13. The apparatus of claim 12, wherein each of the multiple datasets corresponds to a beam generated using the beamformer.
  - 14. The apparatus of claim 12, wherein the beamformer is one of:
    - a fixed beamformer or a dynamic beamformer.
  - 15. The apparatus of claim 11, wherein the one or more quantities indicative of human voice activity comprise a likelihood score of human voice activity in the audio signal represented in the dataset for the corresponding direction.
  - 16. The apparatus of claim 11, wherein the one or more quantities indicative of human voice activity comprise a signal-to-noise ratio (SNR).
  - 17. The apparatus of claim 16, wherein the SNR is computed as a ratio of a first quantity representing a voice signal and a second quantity representing non-voice signals.
  - 18. The apparatus of claim 11, wherein the one or more quantities indicative of human voice activity represents a likelihood score of the presence of a keyword in the audio signal represented in the dataset for the corresponding direction.
  - 19. The apparatus of claim 11, wherein generating the directional audio signal comprises selecting one of the multiple datasets.
  - 20. The apparatus of claim 11, wherein generating the directional audio signal comprises causing a dynamic beamformer to capture audio in accordance with a sensitivity pattern generated for the particular direction.

21. One or more machine-readable storage devices having encoded thereon computer readable instructions for causing one or more processing devices to perform operations comprising:
- receiving information representing audio captured by a microphone array, wherein the information comprises multiple datasets each representing audio signals captured in accordance with a sensitivity pattern along a corresponding direction with respect to the microphone array;
  
  computing, for each of the multiple datasets, one or more quantities indicative of human voice activity captured from the corresponding direction;
  
  determining, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction; and
  
  generating, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, a directional audio signal in which audio captured from the first direction is emphasized as compared to audio captured from the second direction.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bose Corporation
Original Assignee
Bose Corporation
Inventors
Hicks, Matthew Ryan, Crist, David Rolland, Moghimi, Amir Reza

Granted Patent

US 10,510,362 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 2015/088   Word spotting

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0232   Processing in the frequency...

G10L 25/78   Detection of presence or ab...

G10L 25/84   for discriminating voice fr...

H04R 1/406   microphones

H04R 2203/12   Beamforming aspects for ste...

H04R 2430/23   Direction finding using a s...

H04R 3/005   for combining the signals o...

DIRECTIONAL CAPTURE OF AUDIO BASED ON VOICE-ACTIVITY DETECTION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

29 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

DIRECTIONAL CAPTURE OF AUDIO BASED ON VOICE-ACTIVITY DETECTION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links