Voice activity detection using vocal tract area information
First Claim
1. A voice activity detection (VAD) system, comprising:
- a microphone interface circuit configured for coupling to a microphone to receive an acoustic signal and to convert the acoustic signal to an analog signal;
an analog-to-digital converter configured to receive the analog signal to generate a digital signal; and
a signal processing circuit configured to receive the digital signal and to determine if the digital signal represents a human voice, wherein the signal processing circuit comprises;
an acoustic-energy-based detection module configured to receive one of the analog signal or the digital signal and to provide a sound activity decision that indicates if the acoustic signal is in an audible energy range;
an area-function-based detection module configured to extract features of the acoustic signal from the digital signal based on area-related functions, and to use a machine-learning method to determine an area-based decision that indicates if the acoustic signal represents a human voice, wherein the machine-learning method comprises a plurality of coefficients trained by a plurality of labeled area-related functions; and
a voice activity detection (VAD) decision module configured to make a final VAD decision based on the sound activity decision from the acoustic-energy-based detection module and the area-based decision from the area-function-based detection module; and
a resource-limited device configured to receive the final VAD decision to change an operating mode of the resource-limited device.
1 Assignment
0 Petitions
Accused Products
Abstract
A voice activity detection (VAD) system includes an input processing module configured to receive an acoustic signal, convert the acoustic signal into an analog signal, and subsequently, a digital signal; an energy-based detection module configured to receive one of the analog/digital signals and determine a sound activity decision; an area-function-based detection module configured to derive an area-related function from the digital signal and use a machine learning method to output an area-based decision according to the area related function; and a VAD decision module configured to make a final VAD decision based on the sound activity decision from the energy-based detection module and the area-based decision from the area-function-based detection module.
13 Citations
27 Claims
-
1. A voice activity detection (VAD) system, comprising:
-
a microphone interface circuit configured for coupling to a microphone to receive an acoustic signal and to convert the acoustic signal to an analog signal; an analog-to-digital converter configured to receive the analog signal to generate a digital signal; and a signal processing circuit configured to receive the digital signal and to determine if the digital signal represents a human voice, wherein the signal processing circuit comprises; an acoustic-energy-based detection module configured to receive one of the analog signal or the digital signal and to provide a sound activity decision that indicates if the acoustic signal is in an audible energy range; an area-function-based detection module configured to extract features of the acoustic signal from the digital signal based on area-related functions, and to use a machine-learning method to determine an area-based decision that indicates if the acoustic signal represents a human voice, wherein the machine-learning method comprises a plurality of coefficients trained by a plurality of labeled area-related functions; and a voice activity detection (VAD) decision module configured to make a final VAD decision based on the sound activity decision from the acoustic-energy-based detection module and the area-based decision from the area-function-based detection module; and a resource-limited device configured to receive the final VAD decision to change an operating mode of the resource-limited device. - View Dependent Claims (2, 3, 4)
-
-
5. A voice activity detection (VAD) system, comprising:
-
an input processing module configured to receive an acoustic signal via a microphone, the input processing module configured to convert the acoustic signal into an analog signal, and subsequently, a digital signal; an energy-based detection module configured to receive one of the analog signal or the digital signal and determine a sound activity decision; an area-function-based detection module configured to derive an area-related function from the digital signal and use a machine learning method to output an area-based decision according to the area-related function, wherein the machine learning method comprises a plurality of coefficients trained by a plurality of labeled area related functions; and a VAD decision module configured to make a final VAD decision based on the sound activity decision from the energy-based detection module and the area-based decision from the area-function-based detection module, wherein the final VAD decision is subsequently sent to a resource-limited device to change an operating mode of the resource-limited device. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
Specification