Directional capture of audio based on voice-activity detection
First Claim
1. A method comprising:
- receiving information representing audio captured by a microphone array,responsive to receiving the information, generating by a first beamformer, a first set of multiple directional audio signals each corresponding to a specific emphasized direction with respect to the microphone array;
computing, using one or more processing devices for each of the multiple directional audio signals, one or more quantities indicative of human voice activity captured from the corresponding direction;
determining, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction; and
generating, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, an additional directional audio signal distinct from the first set of multiple directional audio signals,the additional directional audio signal being generated by a second beamformer that emphasizes capture of human voice activity from the first direction as compared to audio captured from the second direction, wherein the second beamformer is a dynamic beamformer that operates, at least in part, based on an input signal received from the first beamformer.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology described in this document can be embodied in a computer-implemented method that includes receiving information representing audio captured by a microphone array, wherein the information includes multiple datasets each representing audio signals captured in accordance with a sensitivity pattern along a corresponding direction with respect to the microphone array. The method also includes computing, using one or more processing devices for each of the multiple datasets, one or more quantities indicative of human voice activity captured from the corresponding direction, and generating, based at least on the one or more quantities computed for a plurality of the multiple datasets, a directional audio signal representing audio captured from a particular direction.
-
Citations
21 Claims
-
1. A method comprising:
-
receiving information representing audio captured by a microphone array, responsive to receiving the information, generating by a first beamformer, a first set of multiple directional audio signals each corresponding to a specific emphasized direction with respect to the microphone array; computing, using one or more processing devices for each of the multiple directional audio signals, one or more quantities indicative of human voice activity captured from the corresponding direction; determining, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction; and generating, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, an additional directional audio signal distinct from the first set of multiple directional audio signals, the additional directional audio signal being generated by a second beamformer that emphasizes capture of human voice activity from the first direction as compared to audio captured from the second direction, wherein the second beamformer is a dynamic beamformer that operates, at least in part, based on an input signal received from the first beamformer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus comprising:
-
a microphone array; one or more acoustic transducers configured to generate audio signals; and an audio processing engine including memory and one or more processing devices configured to; receive information representing the audio captured by the microphone array, responsive to receiving the information, generate by a first beamformer, a first set of multiple directional audio signals each corresponding to a specific emphasized direction with respect to the microphone array, compute, for each of the multiple directional audio signals, one or more quantities indicative of human voice activity captured from the corresponding direction, determine, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction, and generate, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, an additional directional audio signal distinct from the first set of multiple directional audio signals, the additional directional audio signal being generated by a second beamformer that emphasizes capture of human voice activity from the first direction as compared to audio captured from the second direction, wherein the second beamformer is a dynamic beamformer that operates, at least in part, based on an input signal received from the first beamformer. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. One or more machine-readable storage devices having encoded thereon computer readable instructions for causing one or more processing devices to perform operations comprising:
-
receiving information representing audio captured by a microphone array, responsive to receiving the information, generating by a first beamformer, a first set of multiple directional audio signals each corresponding to a specific emphasized direction with respect to the microphone array; computing, for each of the multiple directional audio signals, one or more quantities indicative of human voice activity captured from the corresponding direction; determining, based on the one or more quantities, that an amount of human voice activity captured from a first direction is more than an amount of human voice activity captured from a second direction, whereas an amount of acoustic energy captured from the first direction is less than an amount of acoustic energy captured from the second direction; and generating, responsive to determining that the amount of human voice activity captured from the first direction is more than the amount of human voice activity captured from the second direction, an additional directional audio signal distinct from the first set of multiple directional audio signals, the additional directional audio signal being generated by a second beamformer that emphasizes capture of human voice activity from the first direction as compared to audio captured from the second direction, wherein the second beamformer is a dynamic beamformer that operates, at least in part, based on an input signal received from the first beamformer. - View Dependent Claims (20, 21)
-
Specification