Voice activity decision base on zero crossing rate and spectral sub-band energy
First Claim
1. A voice activity detection method, comprising:
- obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected;
obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame;
obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and
judging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance,wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal,wherein the frequency domain parameter indicates spectral sub-band energy, and wherein the second distance between the frequency domain parameter and the long-term sliding mean of the frequency domain parameter in the history background noise frame is a signal-to-noise ratio of the audio frame,wherein obtaining the signal-to-noise ratio of the audio frame comprises;
obtaining a signal-to-noise ratio of each sub-band according to a ratio of the spectral sub-band energy to the long-term sliding mean of the spectral sub-band energy in the history background noise frame;
performing linear processing or nonlinear processing on the signal-to-noise ratio of each sub-band; and
summing the signal-to-noise ratio of each sub-band after the processing to obtain the signal-to-noise ratio of the audio frame, wherein performing the nonlinear processing on the signal-to-noise ratio of each sub-band comprises determining the signal-to-noise ratio of each sub-band after the nonlinear processing according to
2 Assignments
0 Petitions
Accused Products
Abstract
A voice activity detection method and apparatus, and an electronic device are provided. The method includes: obtaining a time domain parameter and a frequency domain parameter from an audio frame; obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame, and obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and judging whether the audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance and a set of decision inequalities based on the first distance and the second distance. The above technical solutions enable the judgment criterion to have an adaptive adjustment capability, thus improving the performance of the voice activity detection.
19 Citations
10 Claims
-
1. A voice activity detection method, comprising:
-
obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected; obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame; obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and judging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance, wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal, wherein the frequency domain parameter indicates spectral sub-band energy, and wherein the second distance between the frequency domain parameter and the long-term sliding mean of the frequency domain parameter in the history background noise frame is a signal-to-noise ratio of the audio frame, wherein obtaining the signal-to-noise ratio of the audio frame comprises; obtaining a signal-to-noise ratio of each sub-band according to a ratio of the spectral sub-band energy to the long-term sliding mean of the spectral sub-band energy in the history background noise frame; performing linear processing or nonlinear processing on the signal-to-noise ratio of each sub-band; and summing the signal-to-noise ratio of each sub-band after the processing to obtain the signal-to-noise ratio of the audio frame, wherein performing the nonlinear processing on the signal-to-noise ratio of each sub-band comprises determining the signal-to-noise ratio of each sub-band after the nonlinear processing according to - View Dependent Claims (2)
-
-
3. A voice activity detection method, comprising:
-
obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected; obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame; obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and judging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance, wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal, wherein the set of decision inequalities comprises MSSNR≧
a·
DZCR+b and MSSNR≧
(−
c)·
DZCR+d and wherein a and C are coefficients, b and d are constants, MSSNR is obtained according to the first distance, and DZCR is obtained according to the second distance. - View Dependent Claims (4, 5, 6, 7, 8)
-
-
9. A voice activity detection method, comprising:
-
obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected; obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame; obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and judging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance, wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal, wherein the frequency domain parameter indicates spectral sub-band energy, and wherein the second distance between the frequency domain parameter and the long-term sliding mean of the frequency domain parameter in the history background noise frame is a signal-to-noise ratio of the audio frame, wherein the set of decision inequalities comprises MSSNR≧
a·
DZCR+b and MSSNR≧
(−
c)·
DZCR+d, and wherein a and c are coefficients, b and d are constants, MSSNR is a corrected distance between the spectral sub-band energy and the long-term sliding mean of the spectral sub-band energy in the history background noise frame, and DZCR is a distance between the zero-crossing rate and the long-term sliding mean of the zero-crossing rate in the history background noise frame. - View Dependent Claims (10)
-
Specification