Voice Activity Detection Method and Method Used for Voice Activity Detection and Apparatus Thereof
First Claim
1. A voice activity detection (VAD) method, wherein, the method comprises:
- obtaining sub-band signals and spectrum amplitudes of a current frame;
computing values of a energy feature, a spectral centroid feature and a time-domain stability feature of the current frame by using the sub-band signals;
computing values of a spectral flatness feature and a tonality feature according to the spectrum amplitudes;
computing a signal to noise ratio (SNR) parameter of the current frame with a background energy estimated from a previous frame, the energy feature and the energy of SNR sub-bands of the current frame;
computing a tonality signal flag of the current frame with the energy feature, the spectral centroid feature, the time-domain stability feature, the spectral flatness feature and the tonality feature of the current frame;
computing a VAD decision result with the tonality signal flag, the signal to noise ratio parameter, the spectral centroid feature, and the energy feature.
1 Assignment
0 Petitions
Accused Products
Abstract
The present document relates to a voice activity detection (VAD) method and methods used for voice activity detection and apparatus thereof, the VAD method includes: obtaining sub-band signals and spectrum amplitudes of a current frame; computing values of a energy feature and a spectral centroid feature of the current frame according to the sub-band signals; computing a signal to noise ratio parameter of the current frame according to a background noise energy estimated from a previous frame, an energy of SNR sub-bands and a energy feature of the current frame; computing a VAD decision result according to a tonality signal flag, a signal to noise ratio parameter, a spectral centroid feature, and a frame energy feature. The methods and apparatus of the present document can improve the accuracy of non-stationary noise (such as office noise) and music detection.
-
Citations
22 Claims
-
1. A voice activity detection (VAD) method, wherein, the method comprises:
-
obtaining sub-band signals and spectrum amplitudes of a current frame; computing values of a energy feature, a spectral centroid feature and a time-domain stability feature of the current frame by using the sub-band signals;
computing values of a spectral flatness feature and a tonality feature according to the spectrum amplitudes;computing a signal to noise ratio (SNR) parameter of the current frame with a background energy estimated from a previous frame, the energy feature and the energy of SNR sub-bands of the current frame; computing a tonality signal flag of the current frame with the energy feature, the spectral centroid feature, the time-domain stability feature, the spectral flatness feature and the tonality feature of the current frame; computing a VAD decision result with the tonality signal flag, the signal to noise ratio parameter, the spectral centroid feature, and the energy feature. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A voice activity detection (VAD) apparatus, wherein, the apparatus comprises:
-
a filter bank, used to obtain sub-band signals of a current frame; a spectrum amplitude computation unit, used to obtain spectrum amplitudes of the current frame; a feature acquisition unit, used to compute values of a energy feature, a spectral centroid feature and a time-domain stability feature of the current frame according to the sub-band signals;
compute values of a spectral flatness feature and a tonality feature according to the spectrum amplitudes;a flag computation unit, used to compute a tonality signal flag of the current frame according to the energy feature, the spectral centroid feature, the time-domain stability feature, the spectral flatness feature and the tonality feature of the current frame; a signal to noise ratio computation unit, used to compute a signal to noise ratio parameter of the current frame according to a background noise energy estimated from a previous frame, a energy of SNR sub-bands and the energy feature of the current frame; a VAD decision unit, used to compute a VAD decision result according to the tonality signal flag, the signal to noise ratio parameter, the spectral centroid feature and the energy feature.
-
-
7. A background noise detection method, wherein, the method comprises:
-
obtaining sub-band signals and spectrum amplitudes of a current frame; computing values of a energy feature, a spectral centroid feature and a time-domain stability feature of the current frame by using the sub-band signals;
computing values of a spectral flatness feature and a tonality feature according to the spectrum amplitudes;detecting background noise by using the energy feature, the spectral centroid feature, the time-domain stability feature, the spectral flatness feature, and the tonality feature of the current frame, and judging whether the current frame is background noise or not. - View Dependent Claims (8, 9)
-
-
10. A tonality signal detection method, wherein, the method comprises:
-
obtaining sub-band signals and spectrum amplitudes of a current frame; computing values of a energy feature, a spectral centroid feature and a time-domain stability feature of the current frame by using the sub-band signals;
computing values of a spectral flatness feature and a tonality feature according to the spectrum amplitudes;with the tonality feature, the time-domain stability feature, the spectral flatness feature, and the spectral centroid feature of the current frame, determining whether the current frame is tonal signal. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A method for updating a number of current active speech hangover frames in a VAD decision, wherein, the method comprises:
-
obtaining sub-band signals and spectrum amplitudes of a current frame; Obtaining a long-time SNR lt_snr and an average SNR of all sub-bands SNR2_lt_ave by using the sub-band signals, updating a number of hangover frames for active sound according to a decision result, the long-time SNR lt_snr, and the average SNR of all sub-bands SNR2_lt_ave for several previous frames, and the VAD decision for the current frame. - View Dependent Claims (16, 17, 18)
-
-
19. A method for adjusting a signal to noise ratio threshold in a VAD decision, wherein, the method for adjusting comprises:
-
obtain sub-band signals and spectrum amplitudes of a current frame; computing a spectral centroid feature of the current frame with the sub-band signals; computing a long-time SNR through a ratio of an average energy of long-time active frames to an average energy of long-time background noise for a previous frame; adjusting an SNR threshold for making the VAD decision with the spectral centroid feature, the long-time SNR, a number of previous continuous active frames continuous_speech_num, and a number of previous continuous noise frames continuous_noise_num. - View Dependent Claims (20, 21, 22)
-
Specification