Voice activity detection (VAD) for a coded speech bitstream without decoding
First Claim
Patent Images
1. A system for voice activity detection (VAD) within a digitally encoded bitstream, the system comprising:
- a parameter extraction module implemented using one or more hardware processors and configured to extract parameters from a sequence of coded frames from a digitally encoded bitstream containing speech, the parameters extracted being parameters of a codec used in encoding the sequence of coded frames;
a VAD classifier selection module configured to;
determine a bit rate of the digitally encoded bitstream; and
select a given VAD classifier from among a plurality of VAD classifiers based on the determined bit rate, the given VAD classifier having been trained for the determined bit rate of the digitally encoded bitstream with a training file corresponding to the determined bit rate; and
the given VAD classifier implemented using the one or more hardware processors and configured to operate exclusively in a bitstream domain with input of the digitally encoded bitstream to output a VAD decision indicative of whether or not speech is present in one or more of the coded frames, the VAD decision determined through evaluation of the one or more of the coded frames based on bitstream coding parameter classification features and the parameters extracted.
2 Assignments
0 Petitions
Accused Products
Abstract
A system, method and computer program product are described for voice activity detection (VAD) within a digitally encoded bitstream. A parameter extraction module is configured to extract parameters from a sequence of coded frames from a digitally encoded bitstream containing speech. A VAD classifier is configured to operate with input of the digitally encoded bitstream to evaluate each coded frame based on bitstream coding parameter classification features to output a VAD decision indicative of whether or not speech is present in one or more of the coded frames.
-
Citations
17 Claims
-
1. A system for voice activity detection (VAD) within a digitally encoded bitstream, the system comprising:
-
a parameter extraction module implemented using one or more hardware processors and configured to extract parameters from a sequence of coded frames from a digitally encoded bitstream containing speech, the parameters extracted being parameters of a codec used in encoding the sequence of coded frames; a VAD classifier selection module configured to; determine a bit rate of the digitally encoded bitstream; and select a given VAD classifier from among a plurality of VAD classifiers based on the determined bit rate, the given VAD classifier having been trained for the determined bit rate of the digitally encoded bitstream with a training file corresponding to the determined bit rate; and the given VAD classifier implemented using the one or more hardware processors and configured to operate exclusively in a bitstream domain with input of the digitally encoded bitstream to output a VAD decision indicative of whether or not speech is present in one or more of the coded frames, the VAD decision determined through evaluation of the one or more of the coded frames based on bitstream coding parameter classification features and the parameters extracted. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for voice activity detection implemented as a plurality of computer processes executing on at least one hardware processor, the method comprising:
-
extracting parameters from a sequence of coded frames from a digitally encoded bitstream containing speech, the parameters extracted being parameters of a codec used in encoding the sequence of coded frames; determining a bit rate of the digitally encoded bitstream; selecting a given VAD classifier from among a plurality of VAD classifiers based on the determined bit rate, the given VAD classifier having been trained for the determined bit rate of the digitally encoded bitstream with a training file corresponding to the determined bit rate; evaluating one or more of the coded frames with the given VAD classifier, the given VAD classifier configured to operate exclusively in a bitstream domain with input of the digitally encoded bitstream and make a VAD decision for the one or more of the coded frames based on bitstream coding parameter classification features and the parameters extracted; and outputting the VAD decision indicating whether or not speech is present in the one or more of the coded frames. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product implemented in a non-transitory computer readable storage medium for voice activity detection, the product comprising:
-
program code for extracting parameters from a sequence of coded frames from a digitally encoded bitstream containing speech, the parameters extracted being parameters of a codec used in encoding the sequence of coded frames; program code for determining a bit rate of the digitally encoded bitstream; program code for selecting a given VAD classifier from among a plurality of VAD classifiers based on the determined bit rate, the given VAD classifier having been trained for the determined bit rate of the digitally encoded bitstream with a training file corresponding to the determined bit rate; program code for evaluating one or more of the coded frames with the given VAD classifier, the given VAD classifier configured to operate exclusively in a bitstream domain with input of the digitally encoded bitstream and make a VAD decision for the one or more of the coded frames based on bitstream coding parameter classification features and the parameters extracted; and program code for outputting the VAD decision indicating whether or not speech is present in the one or more of the coded frames. - View Dependent Claims (14, 15, 16, 17)
-
Specification