Voice activity detection using vocal tract area information

US 10,460,749 B1
Filed: 06/28/2018
Issued: 10/29/2019
Est. Priority Date: 06/28/2018
Status: Active Grant

First Claim

Patent Images

1. A voice activity detection (VAD) system, comprising:

a microphone interface circuit configured for coupling to a microphone to receive an acoustic signal and to convert the acoustic signal to an analog signal;

an analog-to-digital converter configured to receive the analog signal to generate a digital signal; and

a signal processing circuit configured to receive the digital signal and to determine if the digital signal represents a human voice, wherein the signal processing circuit comprises;

an acoustic-energy-based detection module configured to receive one of the analog signal or the digital signal and to provide a sound activity decision that indicates if the acoustic signal is in an audible energy range;

an area-function-based detection module configured to extract features of the acoustic signal from the digital signal based on area-related functions, and to use a machine-learning method to determine an area-based decision that indicates if the acoustic signal represents a human voice, wherein the machine-learning method comprises a plurality of coefficients trained by a plurality of labeled area-related functions; and

a voice activity detection (VAD) decision module configured to make a final VAD decision based on the sound activity decision from the acoustic-energy-based detection module and the area-based decision from the area-function-based detection module; and

a resource-limited device configured to receive the final VAD decision to change an operating mode of the resource-limited device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice activity detection (VAD) system includes an input processing module configured to receive an acoustic signal, convert the acoustic signal into an analog signal, and subsequently, a digital signal; an energy-based detection module configured to receive one of the analog/digital signals and determine a sound activity decision; an area-function-based detection module configured to derive an area-related function from the digital signal and use a machine learning method to output an area-based decision according to the area related function; and a VAD decision module configured to make a final VAD decision based on the sound activity decision from the energy-based detection module and the area-based decision from the area-function-based detection module.

13 Citations

View as Search Results

27 Claims

1. A voice activity detection (VAD) system, comprising:
- a microphone interface circuit configured for coupling to a microphone to receive an acoustic signal and to convert the acoustic signal to an analog signal;
  
  an analog-to-digital converter configured to receive the analog signal to generate a digital signal; and
  
  a signal processing circuit configured to receive the digital signal and to determine if the digital signal represents a human voice, wherein the signal processing circuit comprises;
  
  an acoustic-energy-based detection module configured to receive one of the analog signal or the digital signal and to provide a sound activity decision that indicates if the acoustic signal is in an audible energy range;
  
  an area-function-based detection module configured to extract features of the acoustic signal from the digital signal based on area-related functions, and to use a machine-learning method to determine an area-based decision that indicates if the acoustic signal represents a human voice, wherein the machine-learning method comprises a plurality of coefficients trained by a plurality of labeled area-related functions; and
  
  a voice activity detection (VAD) decision module configured to make a final VAD decision based on the sound activity decision from the acoustic-energy-based detection module and the area-based decision from the area-function-based detection module; and
  
  a resource-limited device configured to receive the final VAD decision to change an operating mode of the resource-limited device.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, wherein the area-related function comprises one of a plurality of log-area-ratios, a log area function, an area function, and a sagittal distance function.
  - 3. The system of claim 1, wherein the area-function-based detection module is configured to perform:
    - filtering the digital signal with a pre-emphasis factor to obtain a pre-emphasized signal;
      
      weighting a frame of the pre-emphasized signal to a windowed signal by a window function;
      
      converting the windowed signal to a plurality of reflection coefficients;
      
      converting the plurality of reflection coefficients to the area-related function;
      
      feeding the area-related function to a trained classifier to identify onsets of voice; and
      
      issuing the area-based decision.
  - 4. The system of claim 3, wherein the trained classifier is trained offline by a neural network or a logistic regression.

5. A voice activity detection (VAD) system, comprising:
- an input processing module configured to receive an acoustic signal via a microphone, the input processing module configured to convert the acoustic signal into an analog signal, and subsequently, a digital signal;
  
  an energy-based detection module configured to receive one of the analog signal or the digital signal and determine a sound activity decision;
  
  an area-function-based detection module configured to derive an area-related function from the digital signal and use a machine learning method to output an area-based decision according to the area-related function, wherein the machine learning method comprises a plurality of coefficients trained by a plurality of labeled area related functions;
  
  anda VAD decision module configured to make a final VAD decision based on the sound activity decision from the energy-based detection module and the area-based decision from the area-function-based detection module, wherein the final VAD decision is subsequently sent to a resource-limited device to change an operating mode of the resource-limited device.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 6. The system of claim 5, wherein the energy-based detection module is a software module receiving the digital signal.
  - 7. The system of claim 5, wherein the energy-based detection module is a digital hardware block receiving the digital signal.
  - 8. The system of claim 5, wherein the energy-based detection module is an analog hardware block receiving the analog signal.
  - 9. The system of claim 5, wherein the area-related function is a plurality of log-area-ratios.
  - 10. The system of claim 5, wherein the area-related function comprises one of a plurality of log-area-ratios, a log area function, an area function, and a sagittal distance function.
  - 11. The system of claim 5, wherein the sound activity decision is a soft decision value.
  - 12. The system of claim 5, wherein the sound activity decision is a hard decision value.
  - 13. The system of claim 5, wherein the area-function-based detection module is configured to perform the steps of:
    - filtering the digital signal with a pre-emphasis factor to obtain a pre-emphasized signal;
      
      weighting a frame of the pre-emphasized signal to a windowed signal by a window function;
      
      converting the windowed signal to a plurality of reflection coefficients;
      
      converting the plurality of reflection coefficients to the area-related function;
      
      feeding the area-related function to a trained classifier to identify onsets of voice; and
      
      issuing the area-based decision.
  - 14. The system of claim 13, wherein the pre-emphasis factor ranges from 0.5 to 0.99.
  - 15. The system of claim 13, wherein a frame shift ranges from 1 millisecond to 20 milliseconds.
  - 16. The system of claim 13, wherein the window function is one of Blackman, Blackman-Harris, Bohman, Chebyshev, Gaussian, Hamming, Hanning, Kaiser, Nuttall, Parzen, Taylor, and Tukey.
  - 17. The system of claim 13, wherein the trained classifier is trained by a neural network offline.
  - 18. The system of claim 13, wherein the trained classifier is trained by a logistic regression offline.
  - 19. The system of claim 13, wherein the area-based decision is a soft decision value.
  - 20. The system of claim 13, wherein the area-based decision is a hard decision value.
  - 21. The system of claim 13, wherein the area-function-based detection module is configured to further generate a linear predictive error and include this error value to be a feature in the area-based decision.
  - 22. The system of claim 5, further comprises a zero-crossing-based detection module configured to generate a second decision based on a zero crossing rate, wherein the VAD decision module includes the second decision in a final decision process.
  - 23. The system of claim 22, wherein the second decision is a soft decision value.
  - 24. The system of claim 22, wherein the second decision is a hard decision value.
  - 25. The system of claim 5, wherein the resource-limited device is a low power device and the operating mode comprises an idle mode and a wake up mode.
  - 26. The system of claim 5, wherein the resource-limited device is a voice storage device and the operating mode comprises an idle mode and a recording mode.
  - 27. The system of claim 5, wherein the resource-limited device is a voice transmitting device and the operating mode comprises an idle mode and a transmitting mode.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuvoton Technology Corporation (Winbond Electronics Corporation)
Original Assignee
Nuvoton Technology Corporation (Winbond Electronics Corporation)
Inventors
Ru, Powen, Paiz, Alex
Primary Examiner(s)
Shah, Bharatkumar S

Application Number

US16/021,724
Time in Patent Office

488 Days
Field of Search

704232
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/16   using artificial neural net...

G10L 15/22   Procedures used during a sp...

G10L 25/12   the extracted parameters be...

G10L 25/21   the extracted parameters be...

G10L 25/30   using neural networks

G10L 25/78   Detection of presence or ab...

Voice activity detection using vocal tract area information

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

13 Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Voice activity detection using vocal tract area information

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links