Method and device for the detection of vocal signals
First Claim
1. A method for the detection of a vocal signal in a signal that includes noise, said method comprising the steps of:
- cutting up the signal into frames;
sampling each frame to obtain a digital signal comprising a determined number n of samples;
preemphasizing the digital signal to obtain a pre-emphasized digital signal;
filtering the pre-emphasized digital signal by means of a high-pass digital filter to obtain a filtered digital signal;
measuring, in each frame, a maximum energy of the samples of the pre-emphasized signal and a maximum energy of the samples of the filtered digital signal;
determining an energy ratio R between the maximum energy of the samples of the filtered digital signal and the maximum energy of the samples of the pre-emphasized digital signal;
computing, between two limits, the mean long-term values of the energy of the samples of the filtered signal and of the energy ratio;
computing, on the basis of the mean long-term values, four threshold values, two of them being maximum values, and forming two lower limits of the speech state for the filtered signal and the energy ratio respectively, and two of them being minimum signals, forming two upper limits of the noise state for the filtered signal and the energy ratio respectively, to compare with these threshold values, the maximum energy of the filtered signal and the energy ratio;
deciding on the presence of the vocal signal in the signal that includes noise when one of the maximum energy of the filtered digital signal, or the energy ratio, is respectively greater than their maximum threshold values; and
deciding on the absence of a vocal signal in the signal that includes noise when one of the maximum energy of the filtered digital signal, or the energy ratio R, is respectively smaller than their minimum threshold values.
1 Assignment
0 Petitions
Accused Products
Abstract
The method disclosed comprises the steps of: cutting up the signal into frames, sampling each frame to obtain a digital signal comprising a determined number n of samples, pre-emphasizing the digital signal, filtering the pre-emphasized digital signal by means of a high-pass digital filter to obtain a filtered digital signal, measuring, in each frame, the maximum energy of the pre-emphasized signal and the maximum energy of the filtered digital signal, to achieve an energy ratio R between the maximum energy of the filtered digital signal and the maximum energy of the pre-emphasized digital signal. The method also comprises the steps of computing, between two limits, the mean long-term values of the maximum value of the energy of the filtered signal and of the energy ratio and of computing, on the basis of the mean long-term values, four threshold values, two of them being maximum values, forming two lower limits of the speech state for the filtered signal and the energy ratio respectively, and two of them being minimum signals, forming two upper limits of the noise state for the filtered signal and the energy ratio respectively, to compare, with these threshold values, the maximum energy of the filtered signal and the energy ratio, to decide on the presence of the vocal signal in the noise-infested signal when the maximum energy of the filtered digital signal, or the energy ratio, is respectively greater than their maximum threshold values.
24 Citations
11 Claims
-
1. A method for the detection of a vocal signal in a signal that includes noise, said method comprising the steps of:
-
cutting up the signal into frames; sampling each frame to obtain a digital signal comprising a determined number n of samples; preemphasizing the digital signal to obtain a pre-emphasized digital signal; filtering the pre-emphasized digital signal by means of a high-pass digital filter to obtain a filtered digital signal; measuring, in each frame, a maximum energy of the samples of the pre-emphasized signal and a maximum energy of the samples of the filtered digital signal; determining an energy ratio R between the maximum energy of the samples of the filtered digital signal and the maximum energy of the samples of the pre-emphasized digital signal; computing, between two limits, the mean long-term values of the energy of the samples of the filtered signal and of the energy ratio; computing, on the basis of the mean long-term values, four threshold values, two of them being maximum values, and forming two lower limits of the speech state for the filtered signal and the energy ratio respectively, and two of them being minimum signals, forming two upper limits of the noise state for the filtered signal and the energy ratio respectively, to compare with these threshold values, the maximum energy of the filtered signal and the energy ratio; deciding on the presence of the vocal signal in the signal that includes noise when one of the maximum energy of the filtered digital signal, or the energy ratio, is respectively greater than their maximum threshold values; and deciding on the absence of a vocal signal in the signal that includes noise when one of the maximum energy of the filtered digital signal, or the energy ratio R, is respectively smaller than their minimum threshold values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A device for detection of a vocal signal in a signal that includes noise, comprising:
-
first means to compute, in each frame, a ratio between a maximum energy of the pre-emphasized signal and a maximum energy of the filtered digital signal; second means to compute long-term mean values of the maximum energy of the filtered signal and of an energy ratio between maximum energies of said filtered digital signal and said preemphasized signal; third means, coupled to the second means, to compute maximum and minimum adaptive threshold values for the filtered digital signal and the energy ratio based on said long term mean values; and decision means coupled to the third means to decide on the presence of a vocal signal in the digital signal by comparing said maximum energies with said threshold values. - View Dependent Claims (10, 11)
-
Specification