Voice Activity Detection and Pitch Estimation
First Claim
1. A method of detecting voice activity in an audible signal, the method comprising:
- converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands;
identifying at least one pulse pair in the plurality of time-frequency units having a relatively consistent spacing over multiple time intervals on a sub-band basis, wherein the presence of a pulse pair is indicative of voiced speech; and
providing a voice activity signal indicator based at least in part on the presence of a pulse pair.
0 Assignments
0 Petitions
Accused Products
Abstract
Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.
-
Citations
20 Claims
-
1. A method of detecting voice activity in an audible signal, the method comprising:
-
converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands; identifying at least one pulse pair in the plurality of time-frequency units having a relatively consistent spacing over multiple time intervals on a sub-band basis, wherein the presence of a pulse pair is indicative of voiced speech; and providing a voice activity signal indicator based at least in part on the presence of a pulse pair. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A voice activity detector comprising:
-
a conversion module configured to convert an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands; a peak detection module configured to identify one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval; an accumulation module configured to sum one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; and a pulse pair detection module configured to identify at least one pulse pair in the accumulation of one or more pulses. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A voice activity detector comprising:
-
means for converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands; means for identifying one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval; means for accumulating one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; and means for identifying at least one pulse pair in the accumulation of one or more pulses.
-
-
20. A voice activity detector comprising:
-
a processor; a memory including instructions, that when executed by the processor cause the voice activity detector to; convert an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands; identify one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval; accumulate one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; and identify at least one pulse pair in the accumulation of one or more pulses.
-
Specification