Multichannel voice detection in adverse environments
First Claim
1. A method for determining if a voice is present in a mixed sound signal, the method comprising the steps of:
- receiving the mixed sound signal by at least two microphones;
Fast Fourier transforming each received mixed sound signal into the frequency domain;
filtering the transformed signals to output a signal corresponding to a spatial signature of a source;
summing an absolute value squared of the filtered signal over a predetermined range of frequencies; and
comparing the sum to a threshold to determine if a voice is present, wherein if the sum is greater than or equal to the threshold, a voice is present, and if the sum is less than the threshold, a voice is not present.
3 Assignments
0 Petitions
Accused Products
Abstract
A multichannel source activity detection system, e.g., a voice activity detection (VAD) system, and method that exploits spatial localization of a target audio source is provided. The method includes the steps of receiving a mixed sound signal by at least two microphones; Fast Fourier transforming each received mixed sound signal into the frequency domain; filtering the transformed signals to output a signal corresponding to a spatial signature of a source; summing an absolute value squared of the filtered signal over a predetermined range of frequencies; and comparing the sum to a threshold to determine if a voice is present. Additionally, the filtering step includes multiplying the transformed signals by an inverse of a noise spectral power matrix, a vector of channel transfer function ratios, and a source signal spectral power.
51 Citations
22 Claims
-
1. A method for determining if a voice is present in a mixed sound signal, the method comprising the steps of:
-
receiving the mixed sound signal by at least two microphones;
Fast Fourier transforming each received mixed sound signal into the frequency domain;
filtering the transformed signals to output a signal corresponding to a spatial signature of a source;
summing an absolute value squared of the filtered signal over a predetermined range of frequencies; and
comparing the sum to a threshold to determine if a voice is present, wherein if the sum is greater than or equal to the threshold, a voice is present, and if the sum is less than the threshold, a voice is not present. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for determining if a voice is present in a mixed sound signal, the method comprising the steps of:
-
receiving the mixed sound signal by at least two microphones;
Fast Fourier transforming each received mixed sound signal into the frequency domain;
filtering the transformed signals to output signals corresponding to a spatial signature for each of a predetermined number of users;
summing separately for each of the users an absolute value squared of the filtered signals over a predetermined range of frequencies;
determining a maximum of the sums; and
comparing the maximum sum to a threshold to determine if a voice is present, wherein if the sum is greater than or equal to the threshold, a voice is present, and if the sum is less than the threshold, a voice is not present. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A voice activity detector for determining if a voice is present in a mixed sound signal comprising:
-
at least two microphones for receiving the mixed sound signal;
a Fast Fourier transformer for transforming each received mixed sound signal into the frequency domain;
a filter for filtering the transformed signals to output a signal corresponding to a spatial signature for each of the transformed signals;
a first summer for summing an absolute value squared of the filtered signals over a predetermined range of frequencies; and
a comparator for comparing the sum to a threshold to determine if a voice is present, wherein if the sum is greater than or equal to the threshold, a voice is present, and if the sum is less than the threshold, a voice is not present. - View Dependent Claims (13, 14, 15)
-
-
16. A voice activity detector for determining if a voice is present in a mixed sound signal comprising:
-
at least two microphones for receiving the mixed sound signal;
a Fast Fourier transformer for transforming each received mixed sound signal into the frequency domain;
at least one filter for filtering the transformed signals to output a signal corresponding to a spatial signature for each of a predetermined number of users;
at least one first summer for summing separately for each of the users an absolute value squared of the filtered signals over a predetermined range of frequencies;
a processor for determining a maximum of the sums; and
a comparator for comparing the maximum sum to a threshold to determine if a voice is present, wherein if the sum is greater than or equal to the threshold, a voice is present, and if the sum is less than the threshold, a voice is not present. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for determining if a voice is present in a mixed sound signal, the method steps comprising:
-
receiving the mixed sound signal by at least two microphones;
Fast Fourier transforming each received mixed sound signal into the frequency domain;
filtering the transformed signals to output a signal corresponding to a spatial signature of a source;
summing an absolute value squared of the filtered signal over a predetermined range of frequencies; and
comparing the sum to a threshold to determine if a voice is present, wherein if the sum is greater than or equal to the threshold, a voice is present, and if the sum is less than the threshold, a voice is not present.
-
Specification