Acoustic activity detection apparatus and method
First Claim
Patent Images
1. A method of detecting human speech, the method comprising:
- receiving streaming audio at a transmitter, the streaming audio comprising a sequence of frames, each having a plurality of samples;
obtaining an energy estimate for each frame of the sequence of frames;
for each of the plurality of frames, comparing the energy estimate to at least one threshold;
based upon the comparing, determining whether speech or noise is detected;
updating the at least one threshold when noise is detected and not updating the at least one threshold when speech is detected, the at least one threshold being determined at least in part by determined statistics from the noise, that are independent of the type of noise;
when speech is detected, sending an interrupt signal to a voice trigger module and a first control signal to the transmitter, the first control signal being effective to cause the transmission of the streaming audio from the transmitter to the voice trigger module, and the interrupt being effective to wake up the voice trigger module from a low power sleep state;
when noise is detected, sending a second control signal to the transmitter, the second control signal being effective to stop the transmission of the streaming audio from the transmitter to the voice trigger module, and disabling the interrupt signal to the voice trigger module, the disablement of the interrupt signal being effective to allow the voice trigger module to return to the low power sleep state.
2 Assignments
0 Petitions
Accused Products
Abstract
Streaming audio is received. The streaming audio includes a frame having plurality of samples. An energy estimate is obtained for the plurality of samples. The energy estimate is compared to at least one threshold. In addition, a band pass estimate of the signal is determined. An energy estimate is obtained for the band-passed plurality of samples. The two energy estimates are compared to at least one threshold each. Based upon the comparison operation, a determination is made as to whether speech is detected.
-
Citations
13 Claims
-
1. A method of detecting human speech, the method comprising:
-
receiving streaming audio at a transmitter, the streaming audio comprising a sequence of frames, each having a plurality of samples; obtaining an energy estimate for each frame of the sequence of frames; for each of the plurality of frames, comparing the energy estimate to at least one threshold; based upon the comparing, determining whether speech or noise is detected; updating the at least one threshold when noise is detected and not updating the at least one threshold when speech is detected, the at least one threshold being determined at least in part by determined statistics from the noise, that are independent of the type of noise; when speech is detected, sending an interrupt signal to a voice trigger module and a first control signal to the transmitter, the first control signal being effective to cause the transmission of the streaming audio from the transmitter to the voice trigger module, and the interrupt being effective to wake up the voice trigger module from a low power sleep state; when noise is detected, sending a second control signal to the transmitter, the second control signal being effective to stop the transmission of the streaming audio from the transmitter to the voice trigger module, and disabling the interrupt signal to the voice trigger module, the disablement of the interrupt signal being effective to allow the voice trigger module to return to the low power sleep state. - View Dependent Claims (2, 3)
-
-
4. A method, the method comprising:
-
receiving streaming audio at a transmitter, the streaming audio comprising a sequence of frames, each with a plurality of samples; obtaining a first energy estimate for the frame and a second energy estimate for the band passed signal, the band passed signal having a frequency of approximately 2 kHz to 5 kHz such that the band passed signal captures the sibilant and fricative characteristics of the speech, the band passed signal being from the same frame of the plurality of samples; in a first path, comparing the first energy estimate of the full band signal to at least one first threshold and based upon the comparing, determining whether speech or noise is detected; in a second path performed in parallel with the first path, comparing the second energy estimate from the band passed signal to at least one second threshold and based upon the comparing, determining whether speech or noise is detected; in each of the first path and the second path, updating the at least one threshold when noise is detected and not updating the at least one threshold when speech is detected, the at least one threshold being determined at least in part by determined statistics from the noise that are independent of the type of noise; in each of the first path and the second path, when speech is detected, sending an interrupt signal to a voice trigger module and a first control signal to the transmitter, the first control signal being effective to cause the transmission of the streaming audio from the transmitter to the voice trigger module, and the interrupt being effective to wake up the voice trigger module from a low power sleep state; in each of the first path and the second path, when noise is detected, sending a second control signal to the transmitter, the second control signal being effective to stop the transmission of the streaming audio from the transmitter to the voice trigger module, and disabling the interrupt signal to the voice trigger module, the disablement of the interrupt signal being effective to allow the voice trigger module to return to the low power sleep state. - View Dependent Claims (5, 6)
-
-
7. An apparatus configured to distinguish speech activity from background noise, the apparatus comprising:
-
an analog sub-system that converts sound energy into an analog electrical signal; a conversion module coupled to the analog system that converts the analog signal into a digital signal; a transmitter module; a digital sub-system coupled to the conversion module, the digital sub-system including an acoustic activity detection (AAD) module, the AAD module configured to receive the digital signal, the digital signal comprising a sequence of frames, each having a plurality of samples, the AAD module configured to obtain an energy estimate for the plurality of samples and compare the energy estimate to at least one threshold, and the AAD module configured to based upon the comparison, determine whether speech or noise is detected, and when speech is detected transmit an interrupt to a voice trigger module and a first control signal to the transmitter module, the first control signal being effective to cause the transmission of the streaming audio from the transmitter module to the voice trigger module, and the interrupt being effective to wake up the voice trigger module from a low power sleep state, the AAD module being configured to when noise is detected, send a second control signal to the transmitter module, the second control signal being effective to stop the transmission of the streaming audio from the transmitter module to the voice trigger module, and to disable the interrupt signal being sent to the voice trigger module, the disablement of the interrupt signal being effective to allow the voice trigger module to return to the low power sleep state, the AAD module being configured to update the at least one threshold when noise is detected and not update the at least one threshold when speech is detected, the at least one threshold being determined at least in part by determined statistics from the noise independent of the type of noise. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
Specification