Voice activity decision base on zero crossing rate and spectral sub-band energy

US 8,296,133 B2
Filed: 11/30/2011
Issued: 10/23/2012
Est. Priority Date: 10/15/2009
Status: Active Grant

First Claim

Patent Images

1. A voice activity detection method, comprising:

obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected;

obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame;

obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and

judging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance,wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal,wherein the frequency domain parameter indicates spectral sub-band energy, and wherein the second distance between the frequency domain parameter and the long-term sliding mean of the frequency domain parameter in the history background noise frame is a signal-to-noise ratio of the audio frame,wherein obtaining the signal-to-noise ratio of the audio frame comprises;

obtaining a signal-to-noise ratio of each sub-band according to a ratio of the spectral sub-band energy to the long-term sliding mean of the spectral sub-band energy in the history background noise frame;

performing linear processing or nonlinear processing on the signal-to-noise ratio of each sub-band; and

summing the signal-to-noise ratio of each sub-band after the processing to obtain the signal-to-noise ratio of the audio frame, wherein performing the nonlinear processing on the signal-to-noise ratio of each sub-band comprises determining the signal-to-noise ratio of each sub-band after the nonlinear processing according to

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice activity detection method and apparatus, and an electronic device are provided. The method includes: obtaining a time domain parameter and a frequency domain parameter from an audio frame; obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame, and obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and judging whether the audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance and a set of decision inequalities based on the first distance and the second distance. The above technical solutions enable the judgment criterion to have an adaptive adjustment capability, thus improving the performance of the voice activity detection.

19 Citations

View as Search Results

10 Claims

1. A voice activity detection method, comprising:
- obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected;
  
  obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame;
  
  obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and
  
  judging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance,wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal,wherein the frequency domain parameter indicates spectral sub-band energy, and wherein the second distance between the frequency domain parameter and the long-term sliding mean of the frequency domain parameter in the history background noise frame is a signal-to-noise ratio of the audio frame,wherein obtaining the signal-to-noise ratio of the audio frame comprises;
  
  obtaining a signal-to-noise ratio of each sub-band according to a ratio of the spectral sub-band energy to the long-term sliding mean of the spectral sub-band energy in the history background noise frame;
  
  performing linear processing or nonlinear processing on the signal-to-noise ratio of each sub-band; and
  
  summing the signal-to-noise ratio of each sub-band after the processing to obtain the signal-to-noise ratio of the audio frame, wherein performing the nonlinear processing on the signal-to-noise ratio of each sub-band comprises determining the signal-to-noise ratio of each sub-band after the nonlinear processing according to
- View Dependent Claims (2)
- - 2. The method according to claim 1, wherein performing the linear processing on the signal-to-noise ratio of each sub-band comprises performing linear processing on the signal-to-noise ratio of each sub-band, and wherein performing the nonlinear processing on the signal-to-noise ratio of each sub-band comprises performing either the same nonlinear processing or different nonlinear processing on the signal-to-noise ratio of each sub-band.

3. A voice activity detection method, comprising:
- obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected;
  
  obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame;
  
  obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and
  
  judging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance,wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal, wherein the set of decision inequalities comprises MSSNR≧
  
  a·
  
  DZCR+b and MSSNR≧
  
  (−
  
  c)·
  
  DZCR+d and wherein a and C are coefficients, b and d are constants, MSSNR is obtained according to the first distance, and DZCR is obtained according to the second distance.
- View Dependent Claims (4, 5, 6, 7, 8)
- - 4. The method according to claim 3, wherein if the audio frame is judged to be the background noise frame, then the long-term sliding mean of the time domain parameter in the history background noise frame is updated according to the time domain parameter of the audio frame and the long-term sliding mean of the frequency domain parameter in the history background noise frame is updated according to the frequency domain parameter of the audio frame.
  - 5. The method according to claim 3, wherein the time domain parameter is a zero-crossing rate, and wherein the first distance between the time domain parameter and the long-term sliding mean of the time domain parameter in the history background noise frame is a Differential Zero-Crossing rate (DZC).
  - 6. The method according to claim 5, wherein if the audio frame is judged to be the background noise frame, then the long-term sliding mean of the zero-crossing rate in the history background noise frame is updated to α
    - ·
      
      ZCR+(1−
      
      α
      
      )·
      
      ZCR, and wherein α
      
      is an update speed control parameter, ZCR is a current value of the long-term sliding mean of the zero-crossing rate in the history background noise frame, and ZCR is a zero-crossing rate of the audio frame.
  - 7. The method according to claim 3, wherein judging whether the current audio frame is the foreground voice frame or the background noise frame according to the first distance, the second distance, and the set of decision inequalities based on the first distance and the second distance comprises:
    - judging that the current audio frame is the foreground voice frame if the first distance and the second distance satisfy any one decision inequality in the set of decision inequalities; and
      
      judging that the audio frame is the background noise frame if the first distance and the second distance satisfy no decision inequality in the set of decision inequalities.
  - 8. The method according to claim 3, wherein determining the variable according to the voice activity detection operation mode or the features of the input signal comprises determining the variable according to one or more of:
    - the voice activity detection operation point, the signal long-term signal-to-noise ratio, the background noise fluctuation degree, and the background noise level, and wherein the voice activity detection operation mode comprises a voice activity detection operation point, and the features of the input signal comprise one or more of;
      
      a signal long-term signal-to-noise ratio, a background noise fluctuation degree, and a background noise level.

9. A voice activity detection method, comprising:
- obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected;
  
  obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame;
  
  obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and
  
  judging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance,wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal,wherein the frequency domain parameter indicates spectral sub-band energy, and wherein the second distance between the frequency domain parameter and the long-term sliding mean of the frequency domain parameter in the history background noise frame is a signal-to-noise ratio of the audio frame, wherein the set of decision inequalities comprises MSSNR≧
  
  a·
  
  DZCR+b and MSSNR≧
  
  (−
  
  c)·
  
  DZCR+d, and wherein a and c are coefficients, b and d are constants, MSSNR is a corrected distance between the spectral sub-band energy and the long-term sliding mean of the spectral sub-band energy in the history background noise frame, and DZCR is a distance between the zero-crossing rate and the long-term sliding mean of the zero-crossing rate in the history background noise frame.
- View Dependent Claims (10)
- - 10. The method according to claim 9, wherein if the audio frame is judged to be the background noise frame, then the long-term sliding mean of the spectral sub-band energy in the history background noise frame is updated to β
    - ·
      
      E_i+(1−
      
      β
      
      )·
      
      E_i, and wherein i=0, . . . N, N is the number of sub-bands minus one, β
      
      is an update speed control parameter, E_i is a current value of the long-term sliding mean of the spectral sub-band energy in the history background noise frame, and E_iis the spectral sub-band energy of the audio frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Top Quality Telephony LLC
Original Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Inventors
Wang, Zhe
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US13/307,683
Publication Number

US 20120065966A1
Time in Patent Office

328 Days
Field of Search

704/210, 704/213, 704/248, 704E15005- 6
US Class Current

704/215
CPC Class Codes

G10L 25/09 the extracted parameters be...

G10L 25/78 Detection of presence or ab...

Voice activity decision base on zero crossing rate and spectral sub-band energy

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

19 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Voice activity decision base on zero crossing rate and spectral sub-band energy

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links