Voice activity detection and pitch estimation
First Claim
1. A method of detecting voice activity in an audible signal, the method comprising:
- converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal;
low pass filtering each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals;
identifying at least one pulse pair in the plurality of time-frequency units characterized by regularly spaced transients over multiple time intervals on a sub-band basis, wherein the presence of a pulse pair is indicative of voiced speech, and wherein the regularly spaced transients correspond to glottal pulses with a frequency range associated with human voice; and
providing a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system.
0 Assignments
0 Petitions
Accused Products
Abstract
Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.
35 Citations
15 Claims
-
1. A method of detecting voice activity in an audible signal, the method comprising:
-
converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal; low pass filtering each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals; identifying at least one pulse pair in the plurality of time-frequency units characterized by regularly spaced transients over multiple time intervals on a sub-band basis, wherein the presence of a pulse pair is indicative of voiced speech, and wherein the regularly spaced transients correspond to glottal pulses with a frequency range associated with human voice; and providing a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A voice activity detector comprising:
a conversion module, including a processing unit, configured to convert an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal; a low pass filtering module configured to low pass filter each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals; a peak detection module configured to identify one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval; an accumulation module configured to sum one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; a pulse pair detection module configured to identify at least one pulse pair in the accumulation of one or more pulses, wherein the at least one pulse pair is characterized by regularly spaced transients corresponding to glottal pulses with a frequency range associated with human voice; and an indicator module for providing a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system. - View Dependent Claims (12, 13)
-
14. A voice activity detector comprising:
-
means for converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal; means for low pass filtering each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals; means for identifying one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval; means for accumulating one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; means for identifying at least one pulse pair in the accumulation of one or more pulses, wherein the at least one pulse pair is characterized by regularly spaced transients corresponding to glottal pulses with a frequency range associated with human voice; and means for providing a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system.
-
-
15. A voice activity detector comprising:
-
a processor; a memory including instructions, that when executed by the processor cause the voice activity detector to; convert an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal; low pass filter each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals; identify one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval;
accumulate one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; andidentify at least one pulse pair in the accumulation of one or more pulses, wherein the at least one pulse pair is characterized by regularly spaced transients corresponding to glottal pulses with a frequency range associated with human voice; and provide a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system.
-
Specification