Threshold adaptation in two-channel noise estimation and voice activity detection
First Claim
1. A method for adapting a threshold used in multi-channel audio noise estimation, comprising, wherein the separation is computed on a per frequency bin and on a per time frame basis as a sequence of discrete-time vectors, each vector having one or more frequency bins and corresponding to a respective time frame of digital audio:
- computing strength of a primary sound pick up channel;
computing strength of a secondary sound pick up channel;
computing separation versus time, being a measure of difference between the strengths of the primary and secondary channels;
analyzing a plurality of peaks in the separation versus time, wherein analyzing a plurality of peaks comprises computing a leaky peak capture function of the separation by updating a current value of the function to a new value in accordance with the separation being greater than a previous value of the function, wherein the leaky peak capture function captures a peak in the separation and then decays over time; and
adjusting a threshold that is to be used in an audio noise estimation process in accordance with the leaky peak capture function of the separation, wherein the threshold is an audio signal strength value.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for adapting a threshold used in multi-channel audio voice activity detection. Strengths of primary and secondary sound pick up channels are computed. A separation, being a measure of difference between the strengths of the primary and secondary channels, is also computed. An analysis of the peaks in separation is performed, e.g. using a leaky peak capture function that captures a peak in the separation and then decays over time, or using a sliding window min-max detector. A threshold that is to be used in a voice activity detection (VAD) process is adjusted, in accordance with the analysis of the peaks. Other embodiments are also described and claimed.
33 Citations
22 Claims
-
1. A method for adapting a threshold used in multi-channel audio noise estimation, comprising, wherein the separation is computed on a per frequency bin and on a per time frame basis as a sequence of discrete-time vectors, each vector having one or more frequency bins and corresponding to a respective time frame of digital audio:
-
computing strength of a primary sound pick up channel; computing strength of a secondary sound pick up channel; computing separation versus time, being a measure of difference between the strengths of the primary and secondary channels; analyzing a plurality of peaks in the separation versus time, wherein analyzing a plurality of peaks comprises computing a leaky peak capture function of the separation by updating a current value of the function to a new value in accordance with the separation being greater than a previous value of the function, wherein the leaky peak capture function captures a peak in the separation and then decays over time; and adjusting a threshold that is to be used in an audio noise estimation process in accordance with the leaky peak capture function of the separation, wherein the threshold is an audio signal strength value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for adapting a threshold used in multi-channel audio voice activity detection, comprising:
-
computing strength of a primary sound pick up channel; computing strength of a secondary sound pick up channel; computing separation versus time, being a measure of difference between the strengths of the primary and secondary channels, wherein the separation is computed on a per frequency bin and on a per time frame basis as a sequence of discrete-time vectors, each vector having one or more frequency bins and corresponding to a respective time frame of digital audio; analyzing a plurality of peaks in the separation versus time, wherein analyzing a plurality of peaks comprises computing a leaky peak capture function of the separation by updating a current value of the function to a new value in accordance with the separation being greater than a previous value of the function, wherein the leaky peak capture function captures a peak in the separation and then decays over time; and adjusting a threshold that is to be used in a voice activity detection (VAD) process in accordance with the leaky peak capture function of the separation, wherein the threshold is an audio signal strength value. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. An audio device comprising:
-
a first microphone positioned near a user'"'"'s mouth; a second microphone positioned far from the user'"'"'s mouth; and audio signal processing circuitry coupled to the first and second microphones, the circuitry to compute separation, being a measure of how much a strength of a signal produced by the first microphone is different than the strength of a signal produced by the second microphone, wherein the separation is a sequence of discrete-time vectors, each vector having one or more frequency bins and corresponding to a respective time-frame of digital audio, and analyze a plurality of peaks in the separation, wherein analyzing a plurality of peaks comprises computing a leaky peak capture function of the separation by updating a current value of the function to a new value in accordance with the separation being greater than a previous value of the function, wherein the leaky peak capture function captures a peak in the separation and then decays over time, wherein the circuitry is to adjust a voice activity detection (VAD) threshold in accordance with the leaky peak capture function of the separation, wherein the VAD threshold is an audio signal strength value. - View Dependent Claims (18, 19, 20, 21, 22)
-
Specification