System and method for performing voice activity detection
First Claim
Patent Images
1. A method for Voice Activity Detection (VAD), the method comprising the steps of:
- producing a microphone signal including speech and ambient noise by a microphone;
providing the microphone signal to noise suppression processing;
processing the microphone signal in the noise suppression processing to generate an ambient noise estimate, comprising;
buffering and windowing the microphone signal to create time domain frames;
transforming the time domain frames into frequency domain frame; and
computing noise estimates of each of the frequency domain frames;
providing the ambient noise estimate to threshold calculation;
calculating a threshold as the square root and mean of the noise estimates;
provided the threshold to the VAD logic;
computing a noise suppressed signal from the microphone signal and the ambient noise estimate by the noise suppression processing, comprising;
computing trim values from the noise estimates; and
scaling the frequency domain frame by the trim values;
comparing the noise suppressed signal to the threshold in the VAD logic;
setting a VAD signal to 0 when the noise suppressed signal is below the threshold; and
setting a VAD signal to 1 when the noise suppressed signal is above the threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
A Voice Activity Detection (VAD) algorithm provides a simple binary signal indicating the presence or absence of speech in a microphone signal. The VAD algorithm includes a first step of noise suppression which both estimates and removes (i.e., filters) ambient noise from the microphone signal to create a filtered signal. The magnitude of the filtered signal is then compared to a threshold in order to produce a VAD output signal. The threshold is dynamic and may be derived either from the filtered signal itself, or from a noise spectrum estimate calculated by the noise suppression step.
12 Citations
9 Claims
-
1. A method for Voice Activity Detection (VAD), the method comprising the steps of:
-
producing a microphone signal including speech and ambient noise by a microphone; providing the microphone signal to noise suppression processing; processing the microphone signal in the noise suppression processing to generate an ambient noise estimate, comprising; buffering and windowing the microphone signal to create time domain frames; transforming the time domain frames into frequency domain frame; and computing noise estimates of each of the frequency domain frames; providing the ambient noise estimate to threshold calculation; calculating a threshold as the square root and mean of the noise estimates; provided the threshold to the VAD logic; computing a noise suppressed signal from the microphone signal and the ambient noise estimate by the noise suppression processing, comprising; computing trim values from the noise estimates; and scaling the frequency domain frame by the trim values; comparing the noise suppressed signal to the threshold in the VAD logic; setting a VAD signal to 0 when the noise suppressed signal is below the threshold; and setting a VAD signal to 1 when the noise suppressed signal is above the threshold. - View Dependent Claims (2)
-
-
3. A method for Voice Activity Detection (VAD), the method comprising the steps of:
-
producing a microphone signal including speech and ambient noise by a microphone; providing the microphone signal to noise suppression processing; processing the microphone signal in the noise suppression processing to generate an ambient noise estimate, comprising; buffering and windowing the microphone signal to create time domain frames; sequentially transforming the time domain frames into microphone signal frequency domain frames; and computing frequency domain noise estimate frames of each of the microphone signal frequency domain frames; providing the ambient noise estimate to threshold calculation; calculating a threshold; provided the threshold to the VAD logic; computing a noise suppressed signal from the microphone signal and the ambient noise estimate by the noise suppression processing, comprising; calculating trims from the frequency domain noise estimate frames; and scaling the microphone signal frequency domain frames using the trims to obtain the noise suppressed signal; calculating a signal energy estimate as the square of the magnitude of the noise suppressed signal; comparing the signal energy estimate to a threshold in the VAD logic; setting a VAD signal to 0 when the signal energy estimate is below the threshold; setting a VAD signal to 1 when the signal energy estimate is above the threshold. - View Dependent Claims (4, 5, 6)
-
-
7. A method for Voice Activity Detection (VAD), the method comprising the steps of:
-
producing a microphone signal including speech and ambient noise by a microphone; buffering and windowing the microphone signal to create time domain frames; sequentially transforming the time domain frames into microphone signal frequency domain frames; computing frequency domain ambient noise estimate frames of each of the microphone signal frequency domain frames; providing the frequency domain ambient noise estimate to threshold calculation; calculating a threshold by taking the square root of each bin of the frequency domain noise estimate frames and then taking the mean of the square roots of each bins; provided the threshold to the VAD logic; calculating trims from the frequency domain noise estimate frames; scaling the microphone signal frequency domain frames using the trims to obtain a frequency domain noise suppressed signal; inverse transforming the frequency domain noise suppressed signal into a time domain noise suppressed signal; buffering and windowing the time domain noise suppressed signal; provided the buffered and windowed time domain noise suppressed signal to the VAD logic; comparing the buffered and windowed time domain noise suppressed signal to a threshold in the VAD logic; setting a VAD signal to 0 when the windowed time domain noise suppressed signal is below the threshold; and setting a VAD signal to 1 when the windowed time domain noise suppressed signal is above the threshold.
-
-
8. A method for Voice Activity Detection (VAD), the method comprising the steps of:
-
producing a microphone signal including speech and ambient noise by a microphone; buffering and windowing the microphone signal to create time domain frames; sequentially transforming the time domain frames into microphone signal frequency domain frames; computing frequency domain noise estimate frames of each of the microphone signal frequency domain frames; calculating trims from the frequency domain noise estimate frames; scaling the microphone signal frequency domain frames using the trims to obtain a frequency domain noise suppressed signal; inverse transforming the frequency domain noise suppressed signal into a time domain noise suppressed signal; buffering and windowing the time domain noise suppressed signal; squaring the buffered and windowed time domain noise suppressed signal to generate a signal energy estimate; determining a threshold using prior signal energy estimate; comparing the present signal energy estimate to the threshold in the VAD logic; setting a VAD signal to 0 when the signal energy estimate is below the threshold; and setting a VAD signal to 1 when the signal energy estimate is above the threshold. - View Dependent Claims (9)
-
Specification