Voice Activity Detection and Pitch Estimation

US 20130231932A1
Filed: 08/20/2012
Published: 09/05/2013
Est. Priority Date: 03/05/2012
Status: Active Grant

First Claim

Patent Images

1. A method of detecting voice activity in an audible signal, the method comprising:

converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands;

identifying at least one pulse pair in the plurality of time-frequency units having a relatively consistent spacing over multiple time intervals on a sub-band basis, wherein the presence of a pulse pair is indicative of voiced speech; and

providing a voice activity signal indicator based at least in part on the presence of a pulse pair.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.

Citations

20 Claims

1. A method of detecting voice activity in an audible signal, the method comprising:
- converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands;
  
  identifying at least one pulse pair in the plurality of time-frequency units having a relatively consistent spacing over multiple time intervals on a sub-band basis, wherein the presence of a pulse pair is indicative of voiced speech; and
  
  providing a voice activity signal indicator based at least in part on the presence of a pulse pair.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising receiving the audible signal from a single audio sensor device.
  - 3. The method of claim 1, further comprising receiving the audible signal from a plurality of audio sensors.
  - 4. The method of claim 1, wherein the plurality of sub-bands is contiguously distributed throughout the frequency spectrum associated with human speech.
  - 5. The method of claim 1, further comprising at least one of amplitude and frequency filtering the audible signal prior to converting the audible signal into the corresponding plurality of time-frequency units.
  - 6. The method of claim 1, wherein converting the audible signal into the corresponding plurality of time-frequency units includes applying a signal decomposition to the audible signal.
  - 7. The method of claim 6, wherein the signal decomposition includes a Fast Fourier Transform.
  - 8. The method of claim 1, further comprising low pass filtering each of the time-frequency units to obtain a respective frequency domain envelope for each of the plurality of sequential intervals.
  - 9. The method of claim 8, wherein each of the plurality of sequential intervals has substantially the same duration.
  - 10. The method of claim 8, wherein identifying at least one pulse pair comprises:
    - identifying one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval;
      
      accumulating the one or more pulse pairs having a given separation over sequential intervals on a sub-band basis;
      
      smoothing the accumulation of one or more pulses; and
      
      identifying at least one pulse pair in the smoothed accumulation of one or more pulses.
  - 11. The method of claim 10, further comprising determining a value indicative of a dominant voice period by:
    - disambiguating the smoothed accumulation of one or more pulses;
      
      filtering the normalized smoothed accumulation of one or more pulses;
      
      identifying the highest amplitude pulse after filtering, wherein the highest amplitude pulse is indicative of the dominant voice period.
  - 12. The method of claim 11, wherein normalizing comprises performing a zero-mean.
  - 13. The method of claim 1, wherein the voice activity signal indicator is provided to another component of an auditory processing system.

14. A voice activity detector comprising:
- a conversion module configured to convert an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands;
  
  a peak detection module configured to identify one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval;
  
  an accumulation module configured to sum one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; and
  
  a pulse pair detection module configured to identify at least one pulse pair in the accumulation of one or more pulses.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The voice activity detector of claim 14, further comprising:
    - a disambiguation filter configured to disambiguate between a signal component indicative of pitch and a signal component indicative of an integer or fractional multiple of the pitch;
      
      a low pass filter configured to filter the output of the disambiguation filter; and
      
      a pulse identification module configured to identify the highest amplitude pulse after low pass filtering, wherein the highest amplitude pulse is indicative of a dominant voice period in the audible signal.
  - 16. The voice activity detector of claim 14, wherein the conversion module utilizes signal decomposition to convert the audible signal into the corresponding plurality of time-frequency units.
  - 17. The voice activity detector of claim 16, wherein the signal decomposition includes a Fast Fourier Transform.
  - 18. The voice activity detector of claim 14, further comprising a low pass filter stage operable to produce a respective frequency domain envelope for each of the plurality of sequential intervals.

19. A voice activity detector comprising:
- means for converting an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands;
  
  means for identifying one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval;
  
  means for accumulating one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; and
  
  means for identifying at least one pulse pair in the accumulation of one or more pulses.

20. A voice activity detector comprising:
- a processor;
  
  a memory including instructions, that when executed by the processor cause the voice activity detector to;
  
  convert an audible signal into a corresponding plurality of time-frequency units, wherein the time dimension of each time-frequency unit includes at least one of a plurality of sequential intervals, and wherein the frequency dimension of each time-frequency unit includes at least one of a plurality of sub-bands;
  
  identify one or more pulses as candidate glottal pulses in the envelope of the frequency-domain signal for each interval;
  
  accumulate one or more pulse pairs having a given separation over sequential intervals on a sub-band basis; and
  
  identify at least one pulse pair in the accumulation of one or more pulses.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Alexander Escott, Clarence S.H. Chu, Pierre Zakarauskas, Shawn E. Stevenson
Original Assignee
Alexander Escott, Clarence S.H. Chu, Pierre Zakarauskas, Shawn E. Stevenson
Inventors
Zakarauskas, Pierre, Escott, Alexander, Chu, Clarence S.H., Stevenson, Shawn E.

Granted Patent

US 9,384,759 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/236
CPC Class Codes

G10L 25/18   the extracted parameters be...

G10L 25/78   Detection of presence or ab...

G10L 25/90   Pitch determination of spee...

G10L 25/93   Discriminating between voic...

Voice Activity Detection and Pitch Estimation

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Voice Activity Detection and Pitch Estimation

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links