Multi-Microphone Voice Activity Detector

US 20110106533A1
Filed: 06/25/2009
Published: 05/05/2011
Est. Priority Date: 06/30/2008
Status: Active Grant

First Claim

Patent Images

1. A method of performing voice activity detection, comprising:

receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component;

receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;

estimating a first signal level based on the first signal;

estimating a second signal level based on the second signal;

estimating a first noise level based on the first signal;

estimating a second noise level based on the second signal;

calculating a first ratio based on the first signal level and the first noise level;

calculating a second ratio based on the second signal level and the second noise level; and

calculating a current voice activity decision, wherein the current voice activity decision signifies that no voice activity is detected if a difference between the first ratio and the second ratio is smaller than a pre-selected threshold, wherein the threshold is (1−

p)ξ

min, wherein p is a propagation decay factor and wherein ξ

min is a pre-selected minimum SNR threshold for voice presence at the microphone closer to the target sound, and wherein the current voice activity decision signifies that voice activity is detected if the difference is larger than or equal to the pre-selected threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dual microphone voice activity detector system is presented. A voice activity detector system estimates the signal level and noise level at each microphone. A level differential between the two microphones of nearby sounds such as the signal is greater than the level differential of more distant sounds such as the noise. Thus, the voice activity detector detects the presence of nearby sounds.

Citations

23 Claims

1. A method of performing voice activity detection, comprising:
- receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component;
  
  receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;
  
  estimating a first signal level based on the first signal;
  
  estimating a second signal level based on the second signal;
  
  estimating a first noise level based on the first signal;
  
  estimating a second noise level based on the second signal;
  
  calculating a first ratio based on the first signal level and the first noise level;
  
  calculating a second ratio based on the second signal level and the second noise level; and
  
  calculating a current voice activity decision, wherein the current voice activity decision signifies that no voice activity is detected if a difference between the first ratio and the second ratio is smaller than a pre-selected threshold, wherein the threshold is (1−
  
  p)ξ
  
  min, wherein p is a propagation decay factor and wherein ξ
  
  min is a pre-selected minimum SNR threshold for voice presence at the microphone closer to the target sound, and wherein the current voice activity decision signifies that voice activity is detected if the difference is larger than or equal to the pre-selected threshold.

2. A method of performing voice activity detection, comprising:
- receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component;
  
  receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;
  
  performing band pass filtering on the first signal prior to estimating the first signal level;
  
  performing band pass filtering on the second signal prior to estimating the second signal level, wherein a band pass frequency ranges between 400 and 1000 Hertz;
  
  estimating a first signal level based on the first signal;
  
  estimating a second signal level based on the second signal;
  
  estimating a first noise level based on the first signal;
  
  estimating a second noise level based on the second signal;
  
  calculating a first ratio based on the first signal level and the first noise level;
  
  calculating a second ratio based on the second signal level and the second noise level; and
  
  calculating a current voice activity decision based on a difference between the first ratio and the second ratio.

3. A method of performing voice activity detection, comprising:
- receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component;
  
  receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;
  
  estimating a first signal level based on the first signal;
  
  estimating a second signal level based on the second signal;
  
  estimating a first noise level based on the first signal;
  
  estimating a second noise level based on the second signal;
  
  calculating a first ratio based on the first signal level and the first noise level;
  
  calculating a second ratio based on the second signal level and the second noise level;
  
  detecting a wind noise based on a third ratio between the first ratio and the second ratio; and
  
  calculating a current voice activity decision based on the wind noise and on a difference between the first ratio and the second ratio.
- View Dependent Claims (4, 5, 6, 7, 8, 9)
- - 4. The method of claim 3, wherein the distance between the first microphone and the second microphone is at least an order of magnitude less than a second distance between the first microphone and a disturbance source of the disturbance component.
  - 5. The method of claim 3, wherein the distance between the first microphone and the second microphone is within an order of magnitude of a second distance between the first microphone and a target source of the target component, and wherein the distance between the first microphone and the second microphone is at least an order of magnitude less than a third distance between the first microphone and a disturbance source of the disturbance component.
  - 6. The method of claim 3, wherein the first microphone is a first distance away from a target source of the target component and a second distance away from a disturbance source of the disturbance component, and wherein the first distance is more than an order of magnitude less than the second distance.
  - 7. The method of claim 3, wherein estimating the first signal level comprises estimating the first signal level by performing a recursive averaging operation on a power level of the first signal.
  - 8. The method of claim 3, wherein estimating the first noise level comprises estimating the first noise level by performing, as indicated by a previous voice activity decision, a recursive averaging operation on a power level of the first signal.
  - 9. The method of claim 3, wherein:
    - estimating the first signal level comprises estimating the first signal level by performing a recursive averaging operation on a power level of the first signal using a first time constant; and
      
      estimating the first noise level comprises estimating the first noise level by performing, as indicated by a previous voice activity decision, a recursive averaging operation on a power level of the first signal using a second time constant, wherein the first time constant is greater than the second time constant.

10. An apparatus including a circuit that performs voice activity detection, the apparatus comprising:
- a first microphone that is configured for receiving a first signal including a first target component and a first disturbance component;
  
  a second microphone, displaced from the first microphone by a distance, that is configured for receiving a second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;
  
  a signal level estimator that is configured for estimating a first signal level based on the first signal and that is configured for estimating a second signal level based on the second signal;
  
  a noise level estimator that is configured for estimating a first noise level based on the first signal and that is configured for estimating a second noise level based on the second signal;
  
  a first divider that is configured for calculating a first ratio based on the first signal level and the first noise level;
  
  a second divider that is configured for calculating a second ratio based on the second signal level and the second noise level; and
  
  a voice activity detector that is configured for calculating a current voice activity decision, wherein the current voice activity decision signifies that no voice activity is detected if a difference between the first ratio and the second ratio is smaller than a pre-selected threshold, wherein the threshold is (1−
  
  p)ξ
  
  min, wherein p is a propagation decay factor and wherein ξ
  
  min is a pre-selected minimum SNR threshold for voice presence at the microphone closer to the target sound, and wherein the current voice activity decision signifies that voice activity is detected if the difference is larger than or equal to the pre-selected threshold.

11. An apparatus including a circuit that performs voice activity detection, the apparatus comprising:
- a first microphone that is configured for receiving a first signal including a first target component and a first disturbance component;
  
  a second microphone, displaced from the first microphone by a distance, that is configured for receiving a second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;
  
  a signal level estimator that is configured for estimating a first signal level based on the first signal and that is configured for estimating a second signal level based on the second signal;
  
  a band pass filter, coupled between the first microphone and the signal level estimator, and coupled between the second microphone and the signal level estimator, that is configured for performing band pass filtering on the first signal and on the second signal, wherein a band pass frequency ranges between 400 and 1000 Hertz;
  
  a noise level estimator that is configured for estimating a first noise level based on the first signal and that is configured for estimating a second noise level based on the second signal;
  
  a first divider that is configured for calculating a first ratio based on the first signal level and the first noise level;
  
  a second divider that is configured for calculating a second ratio based on the second signal level and the second noise level; and
  
  a voice activity detector that is configured for calculating a current voice activity decision based on a difference between the first ratio and the second ratio.

12. An apparatus including a circuit that performs voice activity detection, the apparatus comprising:
- a first microphone that is configured for receiving a first signal including a first target component and a first disturbance component;
  
  a second microphone, displaced from the first microphone by a distance, that is configured for receiving a second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;
  
  a signal level estimator that is configured for estimating a first signal level based on the first signal and that is configured for estimating a second signal level based on the second signal;
  
  a noise level estimator that is configured for estimating a first noise level based on the first signal and that is configured for estimating a second noise level based on the second signal;
  
  a first divider that is configured for calculating a first ratio based on the first signal level and the first noise level;
  
  a second divider that is configured for calculating a second ratio based on the second signal level and the second noise level; and
  
  a voice activity detector that is configured for calculating a current voice activity decision based on a difference between the first ratio and the second ratio, wherein the voice activity detector is further configured for detecting a wind noise based on a third ratio between the first ratio and the second ratio, and wherein the voice activity detector is configured for calculating the current voice activity decision based on the wind noise and on the difference between the first ratio and the second ratio.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The apparatus of claim 12, wherein the distance between the first microphone and the second microphone is at least an order of magnitude less than a second distance between the first microphone and a disturbance source of the disturbance component.
  - 14. The apparatus of claim 12, wherein the distance between the first microphone and the second microphone is within an order of magnitude of a second distance between the first microphone and a target source of the target component, and wherein the distance between the first microphone and the second microphone is at least an order of magnitude less than a third distance between the first microphone and a disturbance source of the disturbance component.
  - 15. The apparatus of claim 12, wherein the first microphone is a first distance away from a target source of the target component and a second distance away from a disturbance source of the disturbance component, and wherein the first distance is more than an order of magnitude less than the second distance.
  - 16. The apparatus of claim 12, wherein the signal level estimator is configured for estimating the first signal level by performing a recursive averaging operation on a power level of the first signal.
  - 17. The apparatus of claim 12, further comprising:
    - a delay element, coupled between the noise level estimator and the voice activity detector, that is configured for storing a previous voice activity decision;
      
      wherein the noise level estimator is configured for estimating the first noise level by performing, as indicated by the previous voice activity decision, a recursive averaging operation on a power level of the first signal.
  - 18. The apparatus of claim 12, further comprising:
    - a delay element, coupled between the noise level estimator and the voice activity detector, that is configured for storing a previous voice activity decision;
      
      wherein the signal level estimator is configured for estimating the first signal level by performing a recursive averaging operation on a power level of the first signal, and wherein the noise level estimator is configured for estimating the first noise level by performing, as indicated by the previous voice activity decision, a recursive averaging operation on a power level of the first signal.
  - 19. The apparatus of claim 12, wherein:
    - the signal level estimator is configured for estimating the first signal level by performing a recursive averaging operation on a power level of the first signal using a first time constant; and
      
      the noise level estimator is configured for estimating the first noise level by performing, as indicated by a previous voice activity decision, a recursive averaging operation on a power level of the first signal using a second time constant, wherein the first time constant is greater than the second time constant.
  - 20. The apparatus of claim 12, wherein:
    - the signal level estimator comprises a first signal level estimator coupled between the first microphone and the first divider, and a second signal level estimator coupled between the second microphone and the second divider; and
      
      the noise level estimator comprises a first noise level estimator coupled between the first microphone and the first divider, and a second noise level estimator coupled between the second microphone and the second divider.

21. An apparatus for performing voice activity detection, comprising:
- a first microphone that is configured for receiving a first signal including a first target component and a first disturbance component;
  
  a second microphone, displaced from the first microphone by a distance, that is configured for receiving a second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;
  
  means for estimating a first signal level based on the first signal, for estimating a second signal level based on the second signal, for estimating a first noise level based on the first signal, and for estimating a second noise level based on the second signal;
  
  means for calculating a first ratio based on the first signal level and the first noise level, and for calculating a second ratio based on the second signal level and the second noise level; and
  
  means for detecting a wind noise based on a third ratio between the first ratio and the second ratio, and for calculating a current voice activity decision based on the wind noise and on a difference between the first ratio and the second ratio.

22. A tangible computer-readable storage medium that comprises instructions or a computer program for performing voice activity detection, the instructions or computer program controlling a processor to execute processing, the processing comprising:
- receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component;
  
  receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;
  
  estimating a first signal level based on the first signal;
  
  estimating a second signal level based on the second signal;
  
  estimating a first noise level based on the first signal;
  
  estimating a second noise level based on the second signal;
  
  calculating a first ratio based on the first signal level and the first noise level;
  
  calculating a second ratio based on the second signal level and the second noise level;
  
  detecting a wind noise based on a third ratio between the first ratio and the second ratio; and
  
  calculating a current voice activity decision based on the wind noise and on a difference between the first ratio and the second ratio.

23. A method of performing voice activity detection, comprising:
- receiving a plurality of signals from a plurality of microphones, wherein the plurality of signals include respectively a plurality of target components and a plurality of disturbance components, wherein the plurality of microphones are respectively displaced from one another according to a plurality of distances, wherein the plurality of target components differ respectively therebetween according to the plurality of distances, and wherein the plurality of disturbance components differ respectively therebetween according to the plurality of distances;
  
  estimating a plurality of signal levels based respectively on the plurality of signals;
  
  estimating a plurality of noise levels based respectively on the plurality of signals;
  
  calculating a plurality of ratios based on the plurality of signal levels, respectively, and the plurality of noise levels, respectively;
  
  detecting a wind noise based on a wind noise ratio between the plurality of ratios;
  
  adjusting the plurality of ratios according to a plurality of constants, respectively; and
  
  calculating a current voice activity decision based on the wind noise and on a sum of the plurality of ratios having been adjusted.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Yu, Rongshan

Granted Patent

US 8,554,556 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 25/78 Detection of presence or ab...

Multi-Microphone Voice Activity Detector

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-Microphone Voice Activity Detector

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links