Method and apparatus for detecting voice activity in a speech signal
First Claim
Patent Images
1. In a speech communication system, a method for generating a frame voicing decision, the steps of the method comprising:
- extracting a set of parameters, including pitch gain and pitch lag, from an incoming speech signal, for each frame;
calculating a standard deviation of the pitch lag from the extracted parameters over a consecutive number of subframes;
calculating a long term average of the pitch gain from the extracted parameters; and
making a frame voicing decision according to the results of said calculation step.
9 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for generating frame voicing decisions for an incoming speech signal having periods of active voice and non-active voice for a speech encoder in a speech communications system. A predetermined set of parameters is extracted from the incoming speech signal, including a pitch gain and a pitch lag. A frame voicing decision is made for each frame of the incoming speech signal according to values calculated from the extracted parameters. The predetermined set of parameters further includes a frame full band energy, and a set of spectral parameters called Line Spectral Frequencies (LSF).
42 Citations
13 Claims
-
1. In a speech communication system, a method for generating a frame voicing decision, the steps of the method comprising:
-
extracting a set of parameters, including pitch gain and pitch lag, from an incoming speech signal, for each frame;
calculating a standard deviation of the pitch lag from the extracted parameters over a consecutive number of subframes;
calculating a long term average of the pitch gain from the extracted parameters; and
making a frame voicing decision according to the results of said calculation step. - View Dependent Claims (2, 3, 4, 5, 6, 7)
calculating a short-term average of energy E, {overscore (E)} s ;
calculating a short-term average of {overscore (LSF)} s ;
calculating an average energy {overscore (E)}; and
calculating an average LSF value, {overscore (LSF)}n.
-
-
4. The method according to claim 3, further comprising the steps of:
-
calculating a spectral difference SD1 using a normalized Itakura-Saito measure;
calculating a spectral difference SD2 using a mean square error method;
calculating a spectral difference SD3 using a mean square error method; and
calculating a long-term mean of SD2.
-
-
5. The method according to claim 4, wherein the frame voicing decision is made based on the calculated values.
-
6. The method according to claim 5, further comprising the step of smoothing the frame voicing decision.
-
7. The method according to claim 6, further comprising the step of performing an initialization for a predetermined number of initial frames, such that the voicing decision is set to active voice or non-active voice.
-
8. A Voice Activity Detector (VAD) for making a voicing decision on an incoming speech signal frame, the VAD comprising:
-
an extractor for extracting a set of parameters, including pitch gain and pitch lag, from the incoming speech signal for each frame;
a calculator unit for calculating a standard deviation of the pitch lag from the extracted parameters over a consecutive number of subframes and a long term mean pitch gain from the extracted parameters; and
a decision unit for making a frame voicing decision according to the results from the calculator unit. - View Dependent Claims (9, 10, 11, 12, 13)
a short-term average of energy E, {overscore (E)} s ;
a short-term average of LSF, {overscore (LSF)} s ;
an average energy {overscore (E)}; and
an average LSF value, {overscore (LSF N +L )}.
-
-
11. The VAD according to claim 10, wherein the calculator unit further calculates:
-
a spectral difference SD1 using a normalized Itakura-Saito measure;
a spectral difference SD2 using a mean square error method;
a spectral difference SD3 using a mean square error method; and
a long-term mean of SD2.
-
-
12. The VAD according to claim 11, wherein the decision unit makes a frame voicing decision according to the values calculated by the calculator unit.
-
13. The VAD according to claim 12, wherein the voicing decision is smoothed.
Specification