Multi-Microphone Voice Activity Detector
First Claim
Patent Images
1. A method of performing voice activity detection, comprising:
- receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component;
receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance;
estimating a first signal level based on the first signal;
estimating a second signal level based on the second signal;
estimating a first noise level based on the first signal;
estimating a second noise level based on the second signal;
calculating a first ratio based on the first signal level and the first noise level;
calculating a second ratio based on the second signal level and the second noise level; and
calculating a current voice activity decision, wherein the current voice activity decision signifies that no voice activity is detected if a difference between the first ratio and the second ratio is smaller than a pre-selected threshold, wherein the threshold is (1−
p)ξ
min, wherein p is a propagation decay factor and wherein ξ
min is a pre-selected minimum SNR threshold for voice presence at the microphone closer to the target sound, and wherein the current voice activity decision signifies that voice activity is detected if the difference is larger than or equal to the pre-selected threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A dual microphone voice activity detector system is presented. A voice activity detector system estimates the signal level and noise level at each microphone. A level differential between the two microphones of nearby sounds such as the signal is greater than the level differential of more distant sounds such as the noise. Thus, the voice activity detector detects the presence of nearby sounds.
-
Citations
23 Claims
-
1. A method of performing voice activity detection, comprising:
-
receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component; receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance; estimating a first signal level based on the first signal; estimating a second signal level based on the second signal; estimating a first noise level based on the first signal; estimating a second noise level based on the second signal; calculating a first ratio based on the first signal level and the first noise level; calculating a second ratio based on the second signal level and the second noise level; and calculating a current voice activity decision, wherein the current voice activity decision signifies that no voice activity is detected if a difference between the first ratio and the second ratio is smaller than a pre-selected threshold, wherein the threshold is (1−
p)ξ
min, wherein p is a propagation decay factor and wherein ξ
min is a pre-selected minimum SNR threshold for voice presence at the microphone closer to the target sound, and wherein the current voice activity decision signifies that voice activity is detected if the difference is larger than or equal to the pre-selected threshold.
-
-
2. A method of performing voice activity detection, comprising:
-
receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component; receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance; performing band pass filtering on the first signal prior to estimating the first signal level; performing band pass filtering on the second signal prior to estimating the second signal level, wherein a band pass frequency ranges between 400 and 1000 Hertz; estimating a first signal level based on the first signal; estimating a second signal level based on the second signal; estimating a first noise level based on the first signal; estimating a second noise level based on the second signal; calculating a first ratio based on the first signal level and the first noise level; calculating a second ratio based on the second signal level and the second noise level; and calculating a current voice activity decision based on a difference between the first ratio and the second ratio.
-
-
3. A method of performing voice activity detection, comprising:
-
receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component; receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance; estimating a first signal level based on the first signal; estimating a second signal level based on the second signal; estimating a first noise level based on the first signal; estimating a second noise level based on the second signal; calculating a first ratio based on the first signal level and the first noise level; calculating a second ratio based on the second signal level and the second noise level; detecting a wind noise based on a third ratio between the first ratio and the second ratio; and calculating a current voice activity decision based on the wind noise and on a difference between the first ratio and the second ratio. - View Dependent Claims (4, 5, 6, 7, 8, 9)
-
-
10. An apparatus including a circuit that performs voice activity detection, the apparatus comprising:
-
a first microphone that is configured for receiving a first signal including a first target component and a first disturbance component; a second microphone, displaced from the first microphone by a distance, that is configured for receiving a second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance; a signal level estimator that is configured for estimating a first signal level based on the first signal and that is configured for estimating a second signal level based on the second signal; a noise level estimator that is configured for estimating a first noise level based on the first signal and that is configured for estimating a second noise level based on the second signal; a first divider that is configured for calculating a first ratio based on the first signal level and the first noise level; a second divider that is configured for calculating a second ratio based on the second signal level and the second noise level; and a voice activity detector that is configured for calculating a current voice activity decision, wherein the current voice activity decision signifies that no voice activity is detected if a difference between the first ratio and the second ratio is smaller than a pre-selected threshold, wherein the threshold is (1−
p)ξ
min, wherein p is a propagation decay factor and wherein ξ
min is a pre-selected minimum SNR threshold for voice presence at the microphone closer to the target sound, and wherein the current voice activity decision signifies that voice activity is detected if the difference is larger than or equal to the pre-selected threshold.
-
-
11. An apparatus including a circuit that performs voice activity detection, the apparatus comprising:
-
a first microphone that is configured for receiving a first signal including a first target component and a first disturbance component; a second microphone, displaced from the first microphone by a distance, that is configured for receiving a second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance; a signal level estimator that is configured for estimating a first signal level based on the first signal and that is configured for estimating a second signal level based on the second signal; a band pass filter, coupled between the first microphone and the signal level estimator, and coupled between the second microphone and the signal level estimator, that is configured for performing band pass filtering on the first signal and on the second signal, wherein a band pass frequency ranges between 400 and 1000 Hertz; a noise level estimator that is configured for estimating a first noise level based on the first signal and that is configured for estimating a second noise level based on the second signal; a first divider that is configured for calculating a first ratio based on the first signal level and the first noise level; a second divider that is configured for calculating a second ratio based on the second signal level and the second noise level; and a voice activity detector that is configured for calculating a current voice activity decision based on a difference between the first ratio and the second ratio.
-
-
12. An apparatus including a circuit that performs voice activity detection, the apparatus comprising:
-
a first microphone that is configured for receiving a first signal including a first target component and a first disturbance component; a second microphone, displaced from the first microphone by a distance, that is configured for receiving a second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance; a signal level estimator that is configured for estimating a first signal level based on the first signal and that is configured for estimating a second signal level based on the second signal; a noise level estimator that is configured for estimating a first noise level based on the first signal and that is configured for estimating a second noise level based on the second signal; a first divider that is configured for calculating a first ratio based on the first signal level and the first noise level; a second divider that is configured for calculating a second ratio based on the second signal level and the second noise level; and a voice activity detector that is configured for calculating a current voice activity decision based on a difference between the first ratio and the second ratio, wherein the voice activity detector is further configured for detecting a wind noise based on a third ratio between the first ratio and the second ratio, and wherein the voice activity detector is configured for calculating the current voice activity decision based on the wind noise and on the difference between the first ratio and the second ratio. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. An apparatus for performing voice activity detection, comprising:
-
a first microphone that is configured for receiving a first signal including a first target component and a first disturbance component; a second microphone, displaced from the first microphone by a distance, that is configured for receiving a second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance; means for estimating a first signal level based on the first signal, for estimating a second signal level based on the second signal, for estimating a first noise level based on the first signal, and for estimating a second noise level based on the second signal; means for calculating a first ratio based on the first signal level and the first noise level, and for calculating a second ratio based on the second signal level and the second noise level; and means for detecting a wind noise based on a third ratio between the first ratio and the second ratio, and for calculating a current voice activity decision based on the wind noise and on a difference between the first ratio and the second ratio.
-
-
22. A tangible computer-readable storage medium that comprises instructions or a computer program for performing voice activity detection, the instructions or computer program controlling a processor to execute processing, the processing comprising:
-
receiving a first signal from a first microphone, the first signal including a first target component and a first disturbance component; receiving a second signal from a second microphone displaced from the first microphone by a distance, the second signal including a second target component and a second disturbance component, wherein the first target component differs from the second target component in accordance with the distance, and wherein the first disturbance component differs from the second disturbance component in accordance with the distance; estimating a first signal level based on the first signal; estimating a second signal level based on the second signal; estimating a first noise level based on the first signal; estimating a second noise level based on the second signal; calculating a first ratio based on the first signal level and the first noise level; calculating a second ratio based on the second signal level and the second noise level; detecting a wind noise based on a third ratio between the first ratio and the second ratio; and calculating a current voice activity decision based on the wind noise and on a difference between the first ratio and the second ratio.
-
-
23. A method of performing voice activity detection, comprising:
-
receiving a plurality of signals from a plurality of microphones, wherein the plurality of signals include respectively a plurality of target components and a plurality of disturbance components, wherein the plurality of microphones are respectively displaced from one another according to a plurality of distances, wherein the plurality of target components differ respectively therebetween according to the plurality of distances, and wherein the plurality of disturbance components differ respectively therebetween according to the plurality of distances; estimating a plurality of signal levels based respectively on the plurality of signals; estimating a plurality of noise levels based respectively on the plurality of signals; calculating a plurality of ratios based on the plurality of signal levels, respectively, and the plurality of noise levels, respectively; detecting a wind noise based on a wind noise ratio between the plurality of ratios; adjusting the plurality of ratios according to a plurality of constants, respectively; and calculating a current voice activity decision based on the wind noise and on a sum of the plurality of ratios having been adjusted.
-
Specification