Speech section detection apparatus

US 20050015244A1
Filed: 07/14/2003
Published: 01/20/2005
Est. Priority Date: 07/14/2003
Status: Abandoned Application

First Claim

Patent Images

1. A speech section detection apparatus comprising:

preprocessing means for removing noise contained in a speech signal;

signal-to-noise ratio improving means for improving the signal-to-noise ratio of said speech signal from which noise has been removed by said preprocessing means; and

speech section extracting signal generating means for generating a speech section extracting signal based on said speech signal whose signal-to-noise ratio has been improved by said signal-to-noise improving means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech section detection apparatus capable of reliably detecting a speech section even in the case of a speech signal with low signal-to-noise ratio. The speech signal collected by a microphone and amplified by a line amplifier is converted by an A/D converter into a digital value, which is then stored in a memory. After removing noise from the digitized speech signal, the signal-to-noise ratio is improved by taking short-time auto-correlation and, when the signal level has continued to stay above a threshold value for a predetermined period, it is determined that a speech section has been detected. Further, a prescribed period before and after the thus determined speech section is also forcefully set as a target for extraction so that the beginning and end of the speech section can be reliably detected. Furthermore, to prevent noise from accumulating and causing the threshold value to increase excessively, the threshold value is updated as appropriate by multiplying a moving average taken over a prescribed period in a non-speech section by a predetermined factor, and by setting the resulting product as the threshold value.

17 Citations

View as Search Results

10 Claims

1. A speech section detection apparatus comprising:
- preprocessing means for removing noise contained in a speech signal;
  
  signal-to-noise ratio improving means for improving the signal-to-noise ratio of said speech signal from which noise has been removed by said preprocessing means; and
  
  speech section extracting signal generating means for generating a speech section extracting signal based on said speech signal whose signal-to-noise ratio has been improved by said signal-to-noise improving means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A speech section detection apparatus as claimed in claim 1, wherein said signal-to-noise ratio improving means is a short-time auto-correlation value calculating means for calculating a short-time auto-correlation value of said speech signal from which noise has been removed by said preprocessing means, in accordance with the equation $X_{c} = \frac{1}{J}$
    - ∑
      
      j = 0 J ⁢
      
      X L ⁡
      
      ( n - j ) ×
      
      X L ⁡
      
      ( n - j - M ) where X_C=short-time auto-correlation value X_L=low-pass filter output n=sampling number J=number of correlated samples M=number of independent samples.
  - 3. A speech section detection apparatus as claimed in claim 1, wherein said preprocessing means comprises:
    - a high-pass filter for cutting off low-frequency noise contained in said speech signal; and
      
      a low-pass filter for cutting off high-frequency noise contained in said speech signal.
  - 4. A speech section detection apparatus as claimed in claim 1, wherein said speech section extracting signal generating means sets said speech section extracting signal open when the level of said speech signal whose signal-to-noise ratio has been improved by said signal-to-noise ratio improving means has continued to stay above a predetermined threshold value for a predetermined length of time.
  - 5. A speech section detection apparatus as claimed in claim 2, wherein said speech section extracting signal generating means sets said speech section extracting signal open when the level of said short-time auto-correlation value calculated by said short-time auto-correlation value calculating means has continued to stay above a predetermined threshold value for a predetermined length of time.
  - 6. A speech section detection apparatus as claimed in claim 4 or 5, wherein said speech section extracting signal generating means includes threshold value setting means for setting as said threshold value the product between an average level of said speech signal when said speech section extracting signal is in a closed state and a predetermined factor.
  - 7. A speech section detection apparatus as claimed in claim 5, wherein said speech section extracting signal generating means includes:
    - root-mean-square value calculating means for calculating a root-mean-square value of said short-time auto-correlation value calculated by said short-time auto-correlation value calculating means;
      
      smoothing means for smoothing the root-mean-square value of said short-time auto-correlation value, calculated by said root-mean-square value calculating means; and
      
      threshold value setting means for setting, as said threshold value, the product between the root-mean-square value of said short-time auto-correlation value smoothed by said smoothing means when said speech section extracting signal is in a closed state and a predetermined factor.
  - 8. A speech section detection apparatus as claimed in claim 2, wherein said speech section extracting signal generating means comprises:
    - extracting signal opening means for setting said extracting signal open when said short-time auto-correlation value calculated by said short-time auto-correlation value calculating means has continued to stay above a predetermined threshold value for a predetermined length of time; and
      
      extracting signal retroactively opening means for outputting said speech section extracting signal by setting said extracting signal open retroactively over a predetermined period when said extracting signal has been set open by said extracting signal opening means.
  - 9. A speech section detection apparatus as claimed in claim 2, wherein said speech section extracting signal generating means comprises:
    - extracting signal opening means for setting said extracting signal open when said short-time auto-correlation value calculated by said short-time auto-correlation value calculating means has continued to stay above a predetermined threshold value for a predetermined length of time; and
      
      extracting signal open state maintaining means for outputting said speech section extracting signal by maintaining said extracting signal in an open state for a predetermined period, even after said extracting signal is closed, when said extracting signal has been set open by said extracting signal opening means.
  - 10. A speech section detection apparatus as claimed in claim 2, wherein said speech section extracting signal generating means comprises:
    - extracting signal opening means for setting said extracting signal open when said short-time auto-correlation value calculated by said short-time auto-correlation value calculating means has continued to stay above a predetermined threshold value for a predetermined length of time;
      
      extracting signal retroactively opening means for setting said extracting signal open retroactively over a predetermined period when said extracting signal has been set open by said extracting signal opening means; and
      
      extracting signal open state maintaining means for outputting said speech section extracting signal by maintaining said extracting signal in an open state for a predetermined period, even after said retroactively opened extracting signal is closed, when said extracting signal has been set open retroactively by said retroactively opening means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Ten Limited (DENSO Corporation), Tsuru Gakuen
Original Assignee
Fujitsu Ten Limited (DENSO Corporation), Tsuru Gakuen
Inventors
Terao, Kazuya, Iwata, Osamu, Kodama, Satomi, Nakamura, Masataka, Kitao, Hideki

Application Number

US10/619,874
Publication Number

US 20050015244A1
Time in Patent Office

Days
Field of Search
US Class Current

704/226
CPC Class Codes

G10L 25/78 Detection of presence or ab...

Speech section detection apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

17 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech section detection apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links