System for detecting voice activity and background noise/silence in a speech signal using pitch and signal to noise ratio information

US 6,275,794 B1
Filed: 12/22/1998
Issued: 08/14/2001
Est. Priority Date: 09/18/1998
Status: Expired due to Term

- Alert
- Pin

Associated Cases

Associated Defendants

First Claim

Patent Images

1. In a speech communication system comprising:

(a) a speech encoder for receiving and encoding an incoming speech signal to generate a bit stream for transmission to a speech decoder;

(b) a communication channel for transmission; and

(c) a speech decoder for receiving the bit stream from the speech encoder to decode the bit stream to generate a reconstructed speech signal, the incoming speech signal comprising periods of active voice and non-active voice, a method for generating a frame voicing decision comprising the steps of;

i. extracting a predetermined set of parameters, including a pitch gain and a pitch lag, from the incoming speech signal for each frame;

ii. estimating a signal-to-noise ratio; and

iii. making a frame voicing decision according to the predetermined set of parameters and the signal-to-noise ratio.

View all claims

14 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

A method and apparatus for generating frame voicing decisions for an incoming speech signal having periods of active voice and non-active voice for a speech encoder in a speech communications system. A predetermined set of parameters is extracted from the incoming speech signal, including a pitch gain and a pitch lag. A frame voicing decision is made for each frame of the incoming speech signal according to values calculated from the extracted parameters. The predetermined set of parameters further includes a partial residual frame full band energy, and a set of spectral parameters called Line Spectral Frequencies (LSF). A signal-to-noise value is estimated and tracked to adaptively set threshold values, thereby improving performance under various noise conditions.

42 Citations

View as Search Results

16 Claims

1. In a speech communication system comprising:
- (a) a speech encoder for receiving and encoding an incoming speech signal to generate a bit stream for transmission to a speech decoder;
  
  (b) a communication channel for transmission; and
  
  (c) a speech decoder for receiving the bit stream from the speech encoder to decode the bit stream to generate a reconstructed speech signal, the incoming speech signal comprising periods of active voice and non-active voice, a method for generating a frame voicing decision comprising the steps of;
  
  i. extracting a predetermined set of parameters, including a pitch gain and a pitch lag, from the incoming speech signal for each frame;
  
  ii. estimating a signal-to-noise ratio; and
  
  iii. making a frame voicing decision according to the predetermined set of parameters and the signal-to-noise ratio.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, wherein the predetermined set of parameters further comprises a partial residual full band energy and line spectral frequencies (LSF).
  - 3. A method according to claim 2, wherein the step of making a frame voicing decision further comprises the steps of:
4. A method according to claim 3, wherein the step of making a frame voicing decision further comprises the steps of:
- i) calculating a spectral difference SD₁using a normalized Itakura-Saito measure;
  
  ii) calculating a spectral difference SD₂using a mean square error method;
  
  iii) calculating a spectral difference SD₃using a mean square error method; and
  
  iv) calculating a long-term mean of SD₂.
5. A method according to claim 4, wherein an initial frame voicing decision is made according to the calculated values.
6. A method according to claim 5, wherein the initial frame voicing decision is smoothed.
7. A method according to claim 6, wherein an initialization routine is performed for a predetermined number of initial frames, such that the voicing decision is set to active voice.
8. A method according to claim 1, wherein the step of estimating the signal-to-noise ratio comprises the step of subtracting a running mean of energy of a noise signal {overscore (E)}_Nfrom a running mean of energy of a voice signal R_MEAN_—
- _E.

9. A voice activity detector (VAD) for making a voicing decision on an incoming speech signal frame, the VAD comprising:
- an extractor for extracting a predetermined set of parameters, including a pitch gain and a pitch lag, from the incoming speech signal for each frame;
  
  a calculator unit for calculating a set of predetermined values, including a signal-to-noise ratio SNR, based on the extracted predetermined set of parameters and for adaptively determining threshold values according to the SNR value; and
  
  a decision unit for making a frame voicing decision according to the predetermined set of values.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The VAD according to claim 9, wherein the predetermined set of parameters further comprises a partial residual full band energy and line spectral frequencies (LSF).
  - 11. The VAD according to claim 10, wherein the calculator unit calculates:
12. The VAD according to claim 11, wherein the calculator unit further calculates:
- a spectral difference SD₁using a normalized Itakura-Saito measure;
  
  a spectral difference SD₂using a mean square error method;
  
  a spectral difference SD₃using a mean square error method; and
  
  a long-term mean of SD₂.
13. The VAD according to claim 12, wherein the decision unit makes an initial frame voicing decision according to the values calculated by the calculator unit.
14. The VAD according to claim 13, wherein the initial frame voicing decision is smoothed.

15. A voice activity detection method for detecting voice activity in an incoming speech signal frame, the improvement comprising making a voicing decision based on a pitch lag and a pitch gain of the speech signal frame and using a signal-to-noise ratio to adaptively set threshold values.
- View Dependent Claims (16)
- - 16. The voice activity detection method of claim 15, further comprising making the voicing decision based on a partial residual frame full band energy and a set of spectral parameters called Line Spectral Frequencies (LSF).

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
MACOM Technology Solutions Holdings, Inc.
Original Assignee
Conexant Systems Incorporated (Synaptics Incorporated)
Inventors
Shlomot, Eyal, Benyassine, Adil
Primary Examiner(s)
Korzuch, William
Assistant Examiner(s)
Chawan, Vijay B

Application Number

US09/218,334
Time in Patent Office

966 Days
Field of Search

704/207, 704/208, 704/225, 704/223, 704/219, 704/226, 704/227, 704/228, 704/214, 704/258, 704/266, 704/201
US Class Current

704/207
CPC Class Codes

G10L 25/78 Detection of presence or ab...

System for detecting voice activity and background noise/silence in a speech signal using pitch and signal to noise ratio information

First Claim

14 Assignments

Litigations

0 Petitions

Accused Products

Abstract

42 Citations

16 Claims

Specification

Use Cases

Quick Links

Others

System for detecting voice activity and background noise/silence in a speech signal using pitch and signal to noise ratio information

First Claim

14 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

16 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others