Acoustic activity detection apparatus and method

US 9,076,447 B2
Filed: 10/23/2014
Issued: 07/07/2015
Est. Priority Date: 10/18/2013
Status: Active Grant

First Claim

Patent Images

1. A method of detecting human speech, the method comprising:

receiving streaming audio at a transmitter, the streaming audio comprising a sequence of frames, each having a plurality of samples;

obtaining an energy estimate for each frame of the sequence of frames;

for each of the plurality of frames, comparing the energy estimate to at least one threshold;

based upon the comparing, determining whether speech or noise is detected;

updating the at least one threshold when noise is detected and not updating the at least one threshold when speech is detected, the at least one threshold being determined at least in part by determined statistics from the noise, that are independent of the type of noise;

when speech is detected, sending an interrupt signal to a voice trigger module and a first control signal to the transmitter, the first control signal being effective to cause the transmission of the streaming audio from the transmitter to the voice trigger module, and the interrupt being effective to wake up the voice trigger module from a low power sleep state;

when noise is detected, sending a second control signal to the transmitter, the second control signal being effective to stop the transmission of the streaming audio from the transmitter to the voice trigger module, and disabling the interrupt signal to the voice trigger module, the disablement of the interrupt signal being effective to allow the voice trigger module to return to the low power sleep state.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Streaming audio is received. The streaming audio includes a frame having plurality of samples. An energy estimate is obtained for the plurality of samples. The energy estimate is compared to at least one threshold. In addition, a band pass estimate of the signal is determined. An energy estimate is obtained for the band-passed plurality of samples. The two energy estimates are compared to at least one threshold each. Based upon the comparison operation, a determination is made as to whether speech is detected.

Citations

13 Claims

1. A method of detecting human speech, the method comprising:
- receiving streaming audio at a transmitter, the streaming audio comprising a sequence of frames, each having a plurality of samples;
  
  obtaining an energy estimate for each frame of the sequence of frames;
  
  for each of the plurality of frames, comparing the energy estimate to at least one threshold;
  
  based upon the comparing, determining whether speech or noise is detected;
  
  updating the at least one threshold when noise is detected and not updating the at least one threshold when speech is detected, the at least one threshold being determined at least in part by determined statistics from the noise, that are independent of the type of noise;
  
  when speech is detected, sending an interrupt signal to a voice trigger module and a first control signal to the transmitter, the first control signal being effective to cause the transmission of the streaming audio from the transmitter to the voice trigger module, and the interrupt being effective to wake up the voice trigger module from a low power sleep state;
  
  when noise is detected, sending a second control signal to the transmitter, the second control signal being effective to stop the transmission of the streaming audio from the transmitter to the voice trigger module, and disabling the interrupt signal to the voice trigger module, the disablement of the interrupt signal being effective to allow the voice trigger module to return to the low power sleep state.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, further comprising determining whether a speech hangover has occurred.
  - 3. The method of claim 2, wherein determining whether a speech hangover occurrence utilizes a non-linear process, wherein the hangover is controlled by a first input and a second input, the first input being from an energy comparison with a noise threshold, and the second input being from an energy comparison with a non-linear function of the combination of the noise and speech threshold.

4. A method, the method comprising:
- receiving streaming audio at a transmitter, the streaming audio comprising a sequence of frames, each with a plurality of samples;
  
  obtaining a first energy estimate for the frame and a second energy estimate for the band passed signal, the band passed signal having a frequency of approximately 2 kHz to 5 kHz such that the band passed signal captures the sibilant and fricative characteristics of the speech, the band passed signal being from the same frame of the plurality of samples;
  
  in a first path, comparing the first energy estimate of the full band signal to at least one first threshold and based upon the comparing, determining whether speech or noise is detected;
  
  in a second path performed in parallel with the first path, comparing the second energy estimate from the band passed signal to at least one second threshold and based upon the comparing, determining whether speech or noise is detected;
  
  in each of the first path and the second path, updating the at least one threshold when noise is detected and not updating the at least one threshold when speech is detected, the at least one threshold being determined at least in part by determined statistics from the noise that are independent of the type of noise;
  
  in each of the first path and the second path, when speech is detected, sending an interrupt signal to a voice trigger module and a first control signal to the transmitter, the first control signal being effective to cause the transmission of the streaming audio from the transmitter to the voice trigger module, and the interrupt being effective to wake up the voice trigger module from a low power sleep state;
  
  in each of the first path and the second path, when noise is detected, sending a second control signal to the transmitter, the second control signal being effective to stop the transmission of the streaming audio from the transmitter to the voice trigger module, and disabling the interrupt signal to the voice trigger module, the disablement of the interrupt signal being effective to allow the voice trigger module to return to the low power sleep state.
- View Dependent Claims (5, 6)
- - 5. The method of claim 4, further comprising determining whether a speech hangover has occurred.
  - 6. The method of claim 5, wherein determining whether a speech hangover occurred utilizes a non-linear process, wherein the hangover is controlled by a first input and a second input, the first input being from an energy comparison with a noise threshold, and the second input being from an energy comparison with a non-linear function of the combination of the noise and speech threshold.

7. An apparatus configured to distinguish speech activity from background noise, the apparatus comprising:
- an analog sub-system that converts sound energy into an analog electrical signal;
  
  a conversion module coupled to the analog system that converts the analog signal into a digital signal;
  
  a transmitter module;
  
  a digital sub-system coupled to the conversion module, the digital sub-system including an acoustic activity detection (AAD) module, the AAD module configured to receive the digital signal, the digital signal comprising a sequence of frames, each having a plurality of samples, the AAD module configured to obtain an energy estimate for the plurality of samples and compare the energy estimate to at least one threshold, and the AAD module configured to based upon the comparison, determine whether speech or noise is detected, and when speech is detected transmit an interrupt to a voice trigger module and a first control signal to the transmitter module, the first control signal being effective to cause the transmission of the streaming audio from the transmitter module to the voice trigger module, and the interrupt being effective to wake up the voice trigger module from a low power sleep state, the AAD module being configured to when noise is detected, send a second control signal to the transmitter module, the second control signal being effective to stop the transmission of the streaming audio from the transmitter module to the voice trigger module, and to disable the interrupt signal being sent to the voice trigger module, the disablement of the interrupt signal being effective to allow the voice trigger module to return to the low power sleep state, the AAD module being configured to update the at least one threshold when noise is detected and not update the at least one threshold when speech is detected, the at least one threshold being determined at least in part by determined statistics from the noise independent of the type of noise.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The apparatus of claim 7, wherein the analog sub-system includes a micro-electro-mechanical system (MEMS) transducer element.
  - 9. The apparatus of claim 7, wherein the AAD module is further configured to determine whether a speech hangover has occurred.
  - 10. The apparatus of claim 7, wherein the conversion module comprises a sigma-delta modulator that is configured to convert the analog signal into a single bit stream pulse density modulated (PDM) format.
  - 11. The apparatus of claim 10, wherein the digital subsystem comprises a decimator module that converts the single bit stream pulse density modulated (PDM) format into a pulse code modulated (PCM) format.
  - 12. The apparatus of claim 11, wherein the pulse code modulated (PCM) audio from the decimator module is stored continuously in a circular buffer and in parallel, also provided to the AAD module for processing.
  - 13. The apparatus of claim 11, wherein the AAD module enables the transmission of the digital signal by a transmitter module upon the detection of speech, and wherein the transmitter module comprises a interpolator and digital sigma-delta modulator, that converts the pulse code modulated (PCM) format back to a single bit stream pulse density modulated (PDM) format.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Knowles Electronics Llc (Knowles Corporation)
Original Assignee
Knowles Electronics Llc (Knowles Corporation)
Inventors
Nandy, Dibyendu, Li, Yang, Thomsen, Henrick, Furst, Claus
Primary Examiner(s)
Harper, Vincent P

Application Number

US14/522,129
Publication Number

US 20150112673A1
Time in Patent Office

257 Days
Field of Search

704/233
US Class Current

1/1
CPC Class Codes

G06F 1/32   Means for saving power

G10L 15/20   Speech recognition techniqu...

G10L 15/28   Constructional details of s...

G10L 19/002   Dynamic bit allocation for ...

G10L 2015/088   Word spotting

G10L 25/84   for discriminating voice fr...

Acoustic activity detection apparatus and method

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Acoustic activity detection apparatus and method

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links