Signal processing apparatus and method

US 7,756,707 B2
Filed: 03/18/2005
Issued: 07/13/2010
Est. Priority Date: 03/26/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A speech signal processing apparatus comprising:

a dividing unit which divides an input speech signal into frames, each of which has a predetermined time length;

a calculation unit which calculates a VAD metric for a current frame;

a determination unit which determines whether a signal in the current frame contains speech or non-speech by using the VAD metric and outputs a VAD flag of 1 or 0 indicating whether the current frame contains speech or non-speech, respectively;

a filter unit which smooths the VAD flags output from said determination unit, wherein said filter unit executes a filter process expressed as follows;

V_f=ρ

V_f−

1+(1−

ρ

)X_f,where;

f is a frame index;

V_fis the filter output of the frame f;

X_fis the filter input of the frame f, which is the VAD flag of the frame f; and

ρ

is a constant value as a pole of the filter; and

a state evaluation unit which, according to the output from said filter unit, V_f, evaluates a current state of the speech signal from among a silence state, a speech state, a possible speech state representing an intermediate state from the silence state to the speech state, and a possible silence state representing an intermediate state from the speech state to the silence state,wherein said state evaluation unit performs the following operations;

in the silence state, when the VAD flag becomes 1, the state moves to the possible speech state,in the possible speech state, when V_fexceeds a first threshold value, the state moves to the speech state and V_fis set to 1, and when V_fis below a second threshold value that is smaller that the first threshold value, the state moves to the silence state,in the speech state, when the VAD flag becomes 0, the state moves to the possible silence state, and in the possible silence state, when V_fis below the second threshold value, the state moves to the silence state and V_fis set to 0, and when the VAD flag becomes 1, the state moves to the speech state.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A signal processing apparatus and method for performing a robust endpoint detection of a signal are provided. An input signal sequence is divided into frames each of which has a predetermined time length. The presence of the signal in the frame is detected. After that, the filter process of smoothing the detection result by using the detection result for a past frame is applied to the detection result for a current frame. The filter output is compared with a predetermined threshold value to determine the state of the signal sequence of the current frame on the basis of the comparison result.

126 Citations

View as Search Results

3 Claims

1. A speech signal processing apparatus comprising:
- a dividing unit which divides an input speech signal into frames, each of which has a predetermined time length;
  
  a calculation unit which calculates a VAD metric for a current frame;
  
  a determination unit which determines whether a signal in the current frame contains speech or non-speech by using the VAD metric and outputs a VAD flag of 1 or 0 indicating whether the current frame contains speech or non-speech, respectively;
  
  a filter unit which smooths the VAD flags output from said determination unit, wherein said filter unit executes a filter process expressed as follows;
  
  V_f=ρ
  
  V_f−
  
  1+(1−
  
  ρ
  
  )X_f,where;
  
  f is a frame index;
  
  V_fis the filter output of the frame f;
  
  X_fis the filter input of the frame f, which is the VAD flag of the frame f; and
  
  ρ
  
  is a constant value as a pole of the filter; and
  
  a state evaluation unit which, according to the output from said filter unit, V_f, evaluates a current state of the speech signal from among a silence state, a speech state, a possible speech state representing an intermediate state from the silence state to the speech state, and a possible silence state representing an intermediate state from the speech state to the silence state,wherein said state evaluation unit performs the following operations;
  
  in the silence state, when the VAD flag becomes 1, the state moves to the possible speech state,in the possible speech state, when V_fexceeds a first threshold value, the state moves to the speech state and V_fis set to 1, and when V_fis below a second threshold value that is smaller that the first threshold value, the state moves to the silence state,in the speech state, when the VAD flag becomes 0, the state moves to the possible silence state, and in the possible silence state, when V_fis below the second threshold value, the state moves to the silence state and V_fis set to 0, and when the VAD flag becomes 1, the state moves to the speech state.

2. A speech signal processing method comprising the steps of:
- (a) dividing an input speech signal into frames, each of which has a predetermined time length;
  
  (b) calculating a VAD metric for a current frame;
  
  (c) determining whether a signal in the current frame contains speech or non-speech by using the VAD metric and outputting a VAD flag of 1 or 0 indicating whether the current frame contains speech or non-speech, respectively;
  
  (d) smoothing the VAD flags output from said determination step, wherein said smoothing step executes a filter process expressed as follows;
  
  V_f=ρ
  
  V_f−
  
  1+(1−
  
  ρ
  
  ) X_f,where;
  
  f is a frame index;
  
  V_fis the filter output of the frame f;
  
  X_fis the filter input of the frame f, which is the VAD flag of the frame f; and
  
  ρ
  
  is a constant value as a pole of the filter; and
  
  (e) evaluating, according to the output of said smoothing step, V_f, a current state of the speech signal from among a silence state, a speech state, a possible speech state representing an intermediate state from the silence state to the speech state, and a possible silence state representing an intermediate state from the speech state to the silence state,wherein said evaluating step performs the following operations;
  
  in the silence state, when the VAD flag becomes 1, the state moves to the possible speech state,in the possible speech state, when V_fexceeds a first threshold value, the state moves to the speech state and V_fis set to 1, and when V_fis below a second threshold value that is smaller that the first threshold value, the state moves to the silence state,in the speech state, when the VAD flag becomes 0, the state moves to the possible silence state, andin the possible silence state, when V_fis below the second threshold value, the state moves to the silence state and V_fis set to 0, and when the VAD flag becomes 1, the state moves to the speech state.

3. A computer-readable medium storing program code for causing a computer to perform the steps of:
- (a) dividing an input speech signal sequence into frames, each of which has a predetermined time length;
  
  (b) calculating a VAD metric for a current frame;
  
  (c) determining whether a signal in the current frame contains speech or non-speech by using the VAD metric and outputting a VAD flag of 1 or 0 indicating whether the current frame contains speech or non-speech, respectively;
  
  (d) smoothing the VAD flags output from said determination step, wherein said smoothing step executes a filter process expressed as follows;
  
  V_f=ρ
  
  V_f−
  
  1+(1−
  
  ρ
  
  ) X_f,where;
  
  f is a frame index;
  
  V_fis the filter output of the frame f;
  
  X_fis the filter input of the frame f, which is the VAD flag of the frame f; and
  
  ρ
  
  is a constant value as a pole of the filter; and
  
  (e) evaluating, according to the output of said smoothing step, V_f, a current state of the speech signal from among a silence state, a speech state, a possible speech state representing an intermediate state from the silence state to the speech state, and a possible silence state representing an intermediate state from the speech state to the silence state,wherein said evaluating step performs the following operations;
  
  in the silence state, when the VAD flag becomes 1, the state moves to the possible speech state,in the possible speech state, when V_fexceeds a first threshold value, the state moves to the speech state and V_fis set to 1, and when V_fis below a second threshold value that is smaller that the first threshold value, the state moves to the silence state,in the speech state, when the VAD flag becomes 0, the state moves to the possible silence state, andin the possible silence state, when V_fis below the second threshold value, the state moves to the silence state and V_fis set to 0, and when the VAD flag becomes 1, the state moves to the speech state.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Garner, Philip, Komori, Yasuhiro, Fukada, Toshiaki
Primary Examiner(s)
Opsasnick; Michael N

Application Number

US11/082,931
Publication Number

US 20050216261A1
Time in Patent Office

1,943 Days
Field of Search

704/233
US Class Current

704/233
CPC Class Codes

G10L 25/87 Detection of discrete point...

Signal processing apparatus and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

126 Citations

3 Claims

Specification

Solutions

Use Cases

Quick Links

Signal processing apparatus and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

126 Citations

3 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links