Voice activity detection and pitch estimation

US 9,384,759 B2
Filed: 08/20/2012
Issued: 07/05/2016
Est. Priority Date: 03/05/2012
Status: Active Grant

First Claim

Patent Images

1. A method of detecting voice activity in an audible signal, the method comprising:

converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal;

low pass filtering each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals;

identifying at least one pulse pair in the plurality of time-frequency units characterized by regularly spaced transients over multiple time intervals on a sub-band basis, wherein the presence of a pulse pair is indicative of voiced speech, and wherein the regularly spaced transients correspond to glottal pulses with a frequency range associated with human voice; and

providing a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.

35 Citations

View as Search Results

15 Claims

1. A method of detecting voice activity in an audible signal, the method comprising:
- converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal;
  
  low pass filtering each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals;
  
  identifying at least one pulse pair in the plurality of time-frequency units characterized by regularly spaced transients over multiple time intervals on a sub-band basis, wherein the presence of a pulse pair is indicative of voiced speech, and wherein the regularly spaced transients correspond to glottal pulses with a frequency range associated with human voice; and
  
  providing a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising receiving the audible signal from a single audio sensor device.
  - 3. The method of claim 1, further comprising receiving the audible signal from a plurality of audio sensors.
  - 4. The method of claim 1, wherein the plurality of sub-bands is contiguously distributed throughout the frequency spectrum associated with human speech.
  - 5. The method of claim 1, further comprising at least one of amplitude and frequency filtering the audible signal prior to converting the audible signal into the corresponding plurality of time-frequency units.
  - 6. The method of claim 1, wherein the signal decomposition includes a Fast Fourier Transform.
  - 7. The method of claim 1, wherein each of the plurality of sequential intervals has the same duration.
  - 8. The method of claim 1, wherein identifying at least one pulse pair comprises:
    - identifying one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval;
      
      accumulating the one or more pulse pairs having a given separation over sequential intervals on a sub-band basis;
      
      smoothing the accumulation of one or more pulses; and
      
      identifying at least one pulse pair in the smoothed accumulation of one or more pulses.
  - 9. The method of claim 8, further comprising determining a value indicative of a dominant voice period by:
    - disambiguating the smoothed accumulation of one or more pulses;
      
      filtering the normalized smoothed accumulation of one or more pulses;
      
      identifying the highest amplitude pulse after filtering, wherein the highest amplitude pulse is indicative of the dominant voice period.
  - 10. The method of claim 9, wherein normalizing comprises performing a zero-mean.

11. A voice activity detector comprising:
- a conversion module, including a processing unit, configured to convert an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal;
  
  a low pass filtering module configured to low pass filter each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals;
  
  a peak detection module configured to identify one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval;
  
  an accumulation module configured to sum one or more pulse pairs having a given separation over sequential intervals on a sub-band basis;
  
  a pulse pair detection module configured to identify at least one pulse pair in the accumulation of one or more pulses, wherein the at least one pulse pair is characterized by regularly spaced transients corresponding to glottal pulses with a frequency range associated with human voice; and
  
  an indicator module for providing a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system.
- View Dependent Claims (12, 13)
- - 12. The voice activity detector of claim 11, further comprising:
    - a disambiguation filter configured to disambiguate between a signal component indicative of pitch and a signal component indicative of an integer or fractional multiple of the pitch;
      
      a low pass filter configured to filter the output of the disambiguation filter; and
      
      a pulse identification module configured to identify the highest amplitude pulse after low pass filtering, wherein the highest amplitude pulse is indicative of a dominant voice period in the audible signal.
  - 13. The voice activity detector of claim 11, wherein the signal decomposition includes a Fast Fourier Transform.

14. A voice activity detector comprising:
- means for converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal;
  
  means for low pass filtering each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals;
  
  means for identifying one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval;
  
  means for accumulating one or more pulse pairs having a given separation over sequential intervals on a sub-band basis;
  
  means for identifying at least one pulse pair in the accumulation of one or more pulses, wherein the at least one pulse pair is characterized by regularly spaced transients corresponding to glottal pulses with a frequency range associated with human voice; and
  
  means for providing a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system.

15. A voice activity detector comprising:
- a processor;
  
  a memory including instructions, that when executed by the processor cause the voice activity detector to;
  
  convert an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal;
  
  low pass filter each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals;
  
  identify one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval;
  
  accumulate one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; and
  
  identify at least one pulse pair in the accumulation of one or more pulses, wherein the at least one pulse pair is characterized by regularly spaced transients corresponding to glottal pulses with a frequency range associated with human voice; and
  
  provide a voice activity signal indicator based at least in part on the presence of a pulse pair in order to further the operation of an auditory processing system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Malaspina Labs (Barbados), Inc.
Original Assignee
Malaspina Labs (Barbados), Inc.
Inventors
Zakarauskas, Pierre, Escott, Alexander, Chu, Clarence S. H., Stevenson, Shawn E.
Primary Examiner(s)
ORTIZ SANCHEZ, MICHAEL

Application Number

US13/590,022
Publication Number

US 20130231932A1
Time in Patent Office

1,415 Days
Field of Search

704/208, 704/207, 704/233, 704/234
US Class Current

1/1
CPC Class Codes

G10L 25/18   the extracted parameters be...

G10L 25/78   Detection of presence or ab...

G10L 25/90   Pitch determination of spee...

G10L 25/93   Discriminating between voic...

Voice activity detection and pitch estimation

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

35 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Voice activity detection and pitch estimation

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

35 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links