System and Method for Performing Voice Activity Detection
First Claim
Patent Images
1. A method for Voice Activity Detection (VAD), the method comprising the steps of:
- producing a microphone signal including speech and ambient noise by a microphone;
providing the microphone signal to noise suppression processing;
processing the microphone signal in the noise suppression processing to generate an ambient noise estimate;
providing the ambient noise estimate to threshold calculation;
calculating a threshold;
provided the threshold to the VAD logic;
computing a noise suppressed signal from the microphone signal and the ambient noise estimate by the noise suppression processing;
comparing the noise suppressed signal to a threshold in the VAD logic;
setting a VAD signal to 0 when the noise suppressed signal is below the threshold; and
setting a VAD signal to 1 when the noise suppressed signal is above the
2 Assignments
0 Petitions
Accused Products
Abstract
A Voice Activity Detection (VAD) algorithm provides a simple binary signal indicating the presence or absence of speech in a microphone signal. The VAD algorithm includes a first step of noise suppression which both estimates and removes (i.e., filters) ambient noise from the microphone signal to create a filtered signal. The magnitude of the filtered signal is then compared to a threshold in order to produce a VAD output signal. The threshold is dynamic and may be derived either from the filtered signal itself, or from a noise spectrum estimate calculated by the noise suppression step.
11 Citations
12 Claims
-
1. A method for Voice Activity Detection (VAD), the method comprising the steps of:
-
producing a microphone signal including speech and ambient noise by a microphone; providing the microphone signal to noise suppression processing; processing the microphone signal in the noise suppression processing to generate an ambient noise estimate; providing the ambient noise estimate to threshold calculation; calculating a threshold; provided the threshold to the VAD logic; computing a noise suppressed signal from the microphone signal and the ambient noise estimate by the noise suppression processing; comparing the noise suppressed signal to a threshold in the VAD logic; setting a VAD signal to 0 when the noise suppressed signal is below the threshold; and setting a VAD signal to 1 when the noise suppressed signal is above the - View Dependent Claims (6, 7, 8, 9)
-
-
10. The method of claim 9, wherein the threshold is determined by:
during the most recent non-speech frames, setting the threshold to the scaled result of a first order smoother applied to the prior signal energy estimates. - View Dependent Claims (2, 3, 4, 5, 11)
-
10-1. A method for Voice Activity Detection (VAD), the method comprising the steps of:
-
producing a microphone signal including speech and ambient noise by a microphone; buffering and windowing the microphone signal to create time domain frames; sequentially transforming the time domain frames into microphone signal frequency domain frames; computing frequency domain noise estimate frames of each of the microphone signal frequency domain frames; calculating trims from the frequency domain noise estimate frames; scaling the microphone signal frequency domain frames using the trims to obtain a frequency domain noise suppressed signal; inverse transforming the frequency domain noise suppressed signal into a time domain noise suppressed signal; buffering and windowing the time domain noise suppressed signal; squaring the buffered and windowed time domain noise suppressed signal to generate a signal energy estimate; determining a threshold using prior signal energy estimate; comparing the present signal energy estimate to the threshold in the VAD logic; setting a VAD signal to 0 when the signal energy estimate is below the threshold; and setting a VAD signal to 1 when the signal energy estimate is above the threshold.
-
-
11-2. The method of claim 10, wherein the VAD signal is held to “
- 0”
during a threshold initialization period.
- 0”
-
12. A method for Voice Activity Detection (VAD), the method comprising the steps of:
-
producing a microphone signal including speech and ambient noise by a microphone; buffering and windowing the microphone signal to create time domain frames; sequentially transforming the time domain frames into microphone signal frequency domain frames; computing frequency domain ambient noise estimate frames of each of the microphone signal frequency domain frames; providing the frequency domain ambient noise estimate to threshold calculation; calculating a threshold by taking the square root of each bin of the frequency domain noise estimate frames and then taking the mean of the square roots of each bins; provided the threshold to the VAD logic; calculating trims from the frequency domain noise estimate frames; scaling the microphone signal frequency domain frames using the trims to obtain a frequency domain noise suppressed signal; inverse transforming the frequency domain noise suppressed signal into a time domain noise suppressed signal; buffering and windowing the time domain noise suppressed signal; provided the buffered and windowed time domain noise suppressed signal to the VAD logic; comparing the buffered and windowed time domain noise suppressed signal to a threshold in the VAD logic; setting a VAD signal to 0 when the windowed time domain noise suppressed signal is below the threshold; and setting a VAD signal to 1 when the windowed time domain noise suppressed signal is above the threshold.
-
Specification