Frequency-domain post-filtering voice-activity detector
First Claim
1. A method comprising:
- receiving a signal representing information;
transforming the signal to enhance energy peaks of the signal;
determining if energy peaks of any frequencies other than low frequencies of the transformed signal exceed a first threshold;
in response to determining that the energy peaks of any of the frequencies other than the low frequencies exceed the first threshold, indicating detection of receipt of the information;
determining if a total energy content of the frequencies other than the low frequencies exceeds a second threshold; and
in response to determining that the total energy content exceeds the second threshold, indicating detection of receipt of the information.
4 Assignments
0 Petitions
Accused Products
Abstract
A voice-activity detector (VAD 104) takes (214) a currently-received set and a previously-received set of samples of a time-domain (voice) signal, converts (216) them into a frequency-domain representation of the signal, filters out (218) negative and low (noise) frequencies, weights (220) the energies of frequency bins (ranges) of the remaining frequencies proportionately to their frequencies, and computes (220) the total power of the ranges. It first initializes (226) by determining (304, 306) if power peaks of any of the ranges exceed a first threshold (ceiling 228); if not, it lowers (302) the ceiling and continues initializing, and if so, it ends initializing (308), indicates (334) that voice has been detected, sets (330) the ceiling to the highest peak, and stores (332) the total power as a “smoothed” power. If initialization has ended, it determines (320, 322) if power peaks of any of the ranges exceed a second threshold that is a fraction of the ceiling; if so, it indicates (334) that voice has been detected, sets (330) the ceiling to the highest peak that exceeds the ceiling, and computes (332) a new “smoothed” power as a function of the total power and the current “smoothed” power. If initialization has ended and energy peaks of none of the ranges exceed the second threshold, it determines (340, 342) if a ratio of the total power and the smoothed power exceeds a third threshold; if so, it indicates (344) that voice has been detected, and if not, it indicates (346) that voice has not been detected.
-
Citations
18 Claims
-
1. A method comprising:
-
receiving a signal representing information;
transforming the signal to enhance energy peaks of the signal;
determining if energy peaks of any frequencies other than low frequencies of the transformed signal exceed a first threshold;
in response to determining that the energy peaks of any of the frequencies other than the low frequencies exceed the first threshold, indicating detection of receipt of the information;
determining if a total energy content of the frequencies other than the low frequencies exceeds a second threshold; and
in response to determining that the total energy content exceeds the second threshold, indicating detection of receipt of the information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18)
-
-
15. A method comprising:
-
receiving a sequence of sets each comprising a plurality of time-domain samples of a signal carrying information;
in response to receiving one of the sets, converting the one set and a previously-received one of the sets to a frequency-domain representation of the signal;
in response to the converting, discarding negative-frequency and low-frequency frequency-domain representation of the signal and dividing remaining said frequency-domain representation of the signal into a plurality of frequency ranges;
weighting energies of the ranges directly in relation to frequencies of said ranges;
determining a total energy content of the remaining frequency-domain representation;
in response to a training mode of operation, determining if energy peaks of any of the ranges exceed a first threshold;
in response to determining that the energy peaks of none of the ranges exceed the first threshold, lowering the first threshold;
in response to the training mode and to determining that the energy peaks of any of the ranges exceed the first threshold, ending the training mode, setting a smoothed power to the total energy content, and indicating detection of the information;
in response to determining that the energy peaks of any of the ranges exceed the first threshold, setting the first threshold to a high one of the energy peaks, determining the smoothed power as a function of the smoothed power and the total energy content, and indicating detection of the information;
in response to ending of the training mode, determining if the energy peaks of any of the ranges exceed a second threshold, the second threshold being a fraction of the first threshold;
in response to determining that the energy peaks of none of the ranges exceed the second threshold, determining if a ratio of the determined total power and the smoothed power exceeds a third threshold;
in response to determining that the ratio exceeds the third threshold, indicating detection of the information; and
in response to determining that the ratio does not exceed the third threshold, indicating a lack of detection of the information.
-
Specification