A method for indicating the presence of speech in an audio signal

US 4,959,865 A
Filed: 02/03/1988
Issued: 09/25/1990
Est. Priority Date: 12/21/1987
Status: Expired due to Term

First Claim

Patent Images

1. A method for indicating the presence of speech in an audio signal in each of a plurality of time invariant frames, said method comprising the steps of:

digitizing, low pass filtering and clipping an input audio signal to obtain a digitized, filtered and clipped signal;

thereafter autocorrelating the clipped signal to obtain an autocorrelation function ACF for each of said plurality of frames;

thereafter(1) examining said ACF of each of said plurality of frames for the presence of peaks indicative of pitch to obtain a pitch/no pitch decision for each of said plurality of frames, said examining step comprising the steps of;

determining the amplitude of the highest ACF peak;

determining the amplitude of the second highest ACF peak; and

determining the periodicity of ACF peaks within each of said plurality of frames, whose amplitudes exceed a predetermined threshold, noting how many ACF peaks having he determined periodicity are detected; and

providing a pitch/no pitch decision based on a weighted sum of non-linear functions of the amplitudes of the highest and second highest ACF peak and the number of detected ACF peaks having the determined periodicity;

(2) analyzing said ACF of each of said plurality of frames to detect for a tone in said frame to obtain a tone/no-tone decision for said frame; and

rendering a speech/no-speech decision for said frame, providing a speech decision upon coincidence of a pitch decision with a no-tone decision.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice operated switch employs digital signal processing techniques to examine audio signal frames having harmonic content to identify voiced phonemes and to determined whether the signal frame contains primarily speech or noise. The method and apparatus employ a multiple-stage, delayed-decision adaptive digital signal processing algorithm implemented through the use of commonly available electronic circuit components. Specifically the method and apparatus comprise a plurality of stages, including (1) a low-pass filter to limit examination of input signals to below about one kHz, (2) a digital center-clipped autocorrelation processor whih recognizes that the presence of periodic components of the input signal below and above a peak-related threshold identifies a frame as containing speech or noise, and (3) a nonlinear filtering processor which includes nonlinear smoothing of the frame-level decisions and incorporates a delay, and further incorporates a forward and backward decision extension at the speech-segment level of several tenths of milliseconds to determine whether adjacent frames are primarily speech or primarily noise.

260 Citations

11 Claims

1. A method for indicating the presence of speech in an audio signal in each of a plurality of time invariant frames, said method comprising the steps of:
- digitizing, low pass filtering and clipping an input audio signal to obtain a digitized, filtered and clipped signal;
  
  thereafter autocorrelating the clipped signal to obtain an autocorrelation function ACF for each of said plurality of frames;
  
  thereafter(1) examining said ACF of each of said plurality of frames for the presence of peaks indicative of pitch to obtain a pitch/no pitch decision for each of said plurality of frames, said examining step comprising the steps of;
  
  determining the amplitude of the highest ACF peak;
  
  determining the amplitude of the second highest ACF peak; and
  
  determining the periodicity of ACF peaks within each of said plurality of frames, whose amplitudes exceed a predetermined threshold, noting how many ACF peaks having he determined periodicity are detected; and
  
  providing a pitch/no pitch decision based on a weighted sum of non-linear functions of the amplitudes of the highest and second highest ACF peak and the number of detected ACF peaks having the determined periodicity;
  
  (2) analyzing said ACF of each of said plurality of frames to detect for a tone in said frame to obtain a tone/no-tone decision for said frame; and
  
  rendering a speech/no-speech decision for said frame, providing a speech decision upon coincidence of a pitch decision with a no-tone decision.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 further including the step of overlappingly segmenting said frames after said digitizing step.
  - 3. The method according to claim 1 wherein said autocorrelation step includes normalizing said autocorrelation function.
  - 4. The method according to claim 3 wherein said examining step comprises:
    - obtaining a first preliminary quantitative value corresponding to a first likelihood of pitch detection, andcomparing said second highest ACF peak with a second threshold to obtain a second preliminary quantitative value corresponding to a second likelihood of pitch detection.
  - 5. The method according to claim 4 wherein said analyzing step further includes detecting for a consistent tone over a plurality of frames for application in said rendering step.
  - 6. The method according to claim 1 further including the step, prior to said rendering step, of smoothing pitch/no-pitch decisions over a plurality of frames to suppress excessive transitions between pitch and no-pitch decisions.
  - 7. The method according to claim 1 further including the steps of storing a plurality of speech/no-speech decisions to accumulate a sufficient number to produce speech-segment-level decisions, and producing speech-segment-level decisions of sufficient duration to include unvoiced speech preceding and following voiced speech.

8. An apparatus for indicating the presence of speech in an audio signal comprising:
- a digital low-pass filter and clipping means coupled to filter time-invariant frames of an audio input signal;
  
  means coupled to receive signals processed by said filter and clipping means for obtaining an autocorrelation function for each of a plurality of said frames of said audio signal;
  
  means coupled to process said autocorrelation function for detecting peaks indicative of the presence of pitch of each of said frames of said audio in put signal, said processing means comprising;
  
  a first peak decision processor for determining the amplitude of the highest ACF peak;
  
  a second peak decision processor for determining the amplitude of the second highest ACF peak; and
  
  a periodicity detector means for determining the periodicity of ACF peaks within each of said plurality of frames, whose amplitude exceeds a predetermined threshold, noting how many ACF peaks having the determined periodicity are detected; and
  
  providing a pitch/no pitch decision based on a weighted sum of non-linear functions of the amplitudes of the highest and second highest ACF peak and the number of detected ACF peaks having the determined periodicity;
  
  means for analyzing said ACF of each of said plurality of frames to detect a tone in each of said plurality of frames and to obtain a tone/no tone decision for said frame;
  
  an autocorrelation function periodicy detection means coupled to process said autocorrelation function for detecting the presence of pitch and tone in said audio input signal; and
  
  decision combining means coupled to receive a pitch/no-pitch decision and a tone/no-tone decision for indicating the presence of voice speech upon coincidence of a no-tone decision and a pitch decision.
- View Dependent Claims (9, 10, 11)
- - 9. The apparatus according to claim 8 further including speech-segment-level decision means responsive to the output of said decision combining means indicating the presence of voice speech in a given frame, said speech-segment-level decision means including means for capturing and processing a sufficient number of frames to produce speech-segment-level decisions, including an initial backward extension means, an initial forward extension means, a final backward extension means, a final forward extension means, a short voice segments testing means and a short silence interval testing means, said extension means and said testing means for expanding a time base of said speech-segment-level decision means to include unvoiced speech and gaps between words.
  - 10. The apparatus according to claim 9 further including means for synchronizing said speech-segment-level decisions with corresponding speech segments.
  - 11. The apparatus according to claim 8 further including means for segmenting said frames into time-overlapping frames.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
DSP Group Incorporated (Synaptics Incorporated)
Original Assignee
DSP Group Incorporated (Synaptics Incorporated)
Inventors
Stettiner, Yoram, Adlersberg, Shabtai, Aizner, Mendel
Primary Examiner(s)
NOT, DEFINED
Assistant Examiner(s)
NOT, DEFINED

Application Number

US07/151,740
Time in Patent Office

965 Days
Field of Search

381/46-47, 381/41-45, 381/49, 381/110, 369/513.5, 379/80, 455/116
US Class Current

704/233
CPC Class Codes

G10L 21/04 Time compression or expansion

G10L 25/78 Detection of presence or ab...

A method for indicating the presence of speech in an audio signal

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

260 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

A method for indicating the presence of speech in an audio signal

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

260 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links