NEURAL NETWORK VOICE ACTIVITY DETECTION EMPLOYING RUNNING RANGE NORMALIZATION
First Claim
1. A method of obtaining normalized voice activity detection features from an audio signal comprising the steps of:
- at a computing system, dividing an audio signal into a sequence of time frames;
computing one or more voice activity detection feature of the audio signal for each of the time frames;
computing running estimates of minimum and maximum values of the one or more voice activity detection feature of the audio signal for each of the time frames;
computing input ranges of the one or more voice activity detection feature by comparing the running estimates of the minimum and maximum values of the one or more voice activity detection feature of the audio signal for each of the time frames; and
mapping the one or more voice activity detection feature of the audio signal for each of the time frames from the input ranges to one or more desired target range to obtain one or more normalized voice activity detection feature.
2 Assignments
0 Petitions
Accused Products
Abstract
A “running range normalization” method includes computing running estimates of the range of values of features useful for voice activity detection (VAD) and normalizing the features by mapping them to a desired range. Running range normalization includes computation of running estimates of the minimum and maximum values of VAD features and normalizing the feature values by mapping the original range to a desired range. Smoothing coefficients are optionally selected to directionally bias a rate of change of at least one of the running estimates of the minimum and maximum values. The normalized VAD feature parameters are used to train a machine learning algorithm to detect voice activity and to use the trained machine learning algorithm to isolate or enhance the speech component of the audio data.
45 Citations
22 Claims
-
1. A method of obtaining normalized voice activity detection features from an audio signal comprising the steps of:
-
at a computing system, dividing an audio signal into a sequence of time frames; computing one or more voice activity detection feature of the audio signal for each of the time frames; computing running estimates of minimum and maximum values of the one or more voice activity detection feature of the audio signal for each of the time frames; computing input ranges of the one or more voice activity detection feature by comparing the running estimates of the minimum and maximum values of the one or more voice activity detection feature of the audio signal for each of the time frames; and mapping the one or more voice activity detection feature of the audio signal for each of the time frames from the input ranges to one or more desired target range to obtain one or more normalized voice activity detection feature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method of normalizing voice activity detection features comprising the steps of:
-
segmenting an audio signal into a sequence of time frames; computing running minimum and maximum value estimates for voice activity detection features; computing input ranges by comparing the running minimum and maximum value estimates; and normalizing the voice activity detection features by mapping the voice activity detection features from the input ranges to one or more desired target ranges. - View Dependent Claims (19, 20, 21)
-
-
22. A computer-readable medium storing a computer program for performing a method for identifying voice data within an audio signal, the computer-readable medium comprising:
-
computer storage media; and
computer-executable instructions stored on the computer storage media, which computer-executable instructions, when executed by a computing system, are configured to cause the computing system to;compute a plurality of voice activity detection features; compute running estimates of minimum and maximum values of the voice activity detection features; compute input ranges of the voice activity detection features by comparing the running estimates of the minimum and maximum values; and map the voice activity detection features from the input ranges to one or more desired target ranges to obtain normalized voice activity detection features.
-
Specification