A method for indicating the presence of speech in an audio signal
First Claim
1. A method for indicating the presence of speech in an audio signal in each of a plurality of time invariant frames, said method comprising the steps of:
- digitizing, low pass filtering and clipping an input audio signal to obtain a digitized, filtered and clipped signal;
thereafter autocorrelating the clipped signal to obtain an autocorrelation function ACF for each of said plurality of frames;
thereafter(1) examining said ACF of each of said plurality of frames for the presence of peaks indicative of pitch to obtain a pitch/no pitch decision for each of said plurality of frames, said examining step comprising the steps of;
determining the amplitude of the highest ACF peak;
determining the amplitude of the second highest ACF peak; and
determining the periodicity of ACF peaks within each of said plurality of frames, whose amplitudes exceed a predetermined threshold, noting how many ACF peaks having he determined periodicity are detected; and
providing a pitch/no pitch decision based on a weighted sum of non-linear functions of the amplitudes of the highest and second highest ACF peak and the number of detected ACF peaks having the determined periodicity;
(2) analyzing said ACF of each of said plurality of frames to detect for a tone in said frame to obtain a tone/no-tone decision for said frame; and
rendering a speech/no-speech decision for said frame, providing a speech decision upon coincidence of a pitch decision with a no-tone decision.
1 Assignment
0 Petitions
Accused Products
Abstract
A voice operated switch employs digital signal processing techniques to examine audio signal frames having harmonic content to identify voiced phonemes and to determined whether the signal frame contains primarily speech or noise. The method and apparatus employ a multiple-stage, delayed-decision adaptive digital signal processing algorithm implemented through the use of commonly available electronic circuit components. Specifically the method and apparatus comprise a plurality of stages, including (1) a low-pass filter to limit examination of input signals to below about one kHz, (2) a digital center-clipped autocorrelation processor whih recognizes that the presence of periodic components of the input signal below and above a peak-related threshold identifies a frame as containing speech or noise, and (3) a nonlinear filtering processor which includes nonlinear smoothing of the frame-level decisions and incorporates a delay, and further incorporates a forward and backward decision extension at the speech-segment level of several tenths of milliseconds to determine whether adjacent frames are primarily speech or primarily noise.
260 Citations
11 Claims
-
1. A method for indicating the presence of speech in an audio signal in each of a plurality of time invariant frames, said method comprising the steps of:
-
digitizing, low pass filtering and clipping an input audio signal to obtain a digitized, filtered and clipped signal; thereafter autocorrelating the clipped signal to obtain an autocorrelation function ACF for each of said plurality of frames;
thereafter(1) examining said ACF of each of said plurality of frames for the presence of peaks indicative of pitch to obtain a pitch/no pitch decision for each of said plurality of frames, said examining step comprising the steps of; determining the amplitude of the highest ACF peak; determining the amplitude of the second highest ACF peak; and determining the periodicity of ACF peaks within each of said plurality of frames, whose amplitudes exceed a predetermined threshold, noting how many ACF peaks having he determined periodicity are detected; and providing a pitch/no pitch decision based on a weighted sum of non-linear functions of the amplitudes of the highest and second highest ACF peak and the number of detected ACF peaks having the determined periodicity; (2) analyzing said ACF of each of said plurality of frames to detect for a tone in said frame to obtain a tone/no-tone decision for said frame; and rendering a speech/no-speech decision for said frame, providing a speech decision upon coincidence of a pitch decision with a no-tone decision. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for indicating the presence of speech in an audio signal comprising:
-
a digital low-pass filter and clipping means coupled to filter time-invariant frames of an audio input signal; means coupled to receive signals processed by said filter and clipping means for obtaining an autocorrelation function for each of a plurality of said frames of said audio signal; means coupled to process said autocorrelation function for detecting peaks indicative of the presence of pitch of each of said frames of said audio in put signal, said processing means comprising; a first peak decision processor for determining the amplitude of the highest ACF peak; a second peak decision processor for determining the amplitude of the second highest ACF peak; and a periodicity detector means for determining the periodicity of ACF peaks within each of said plurality of frames, whose amplitude exceeds a predetermined threshold, noting how many ACF peaks having the determined periodicity are detected; and
providing a pitch/no pitch decision based on a weighted sum of non-linear functions of the amplitudes of the highest and second highest ACF peak and the number of detected ACF peaks having the determined periodicity;means for analyzing said ACF of each of said plurality of frames to detect a tone in each of said plurality of frames and to obtain a tone/no tone decision for said frame; an autocorrelation function periodicy detection means coupled to process said autocorrelation function for detecting the presence of pitch and tone in said audio input signal; and decision combining means coupled to receive a pitch/no-pitch decision and a tone/no-tone decision for indicating the presence of voice speech upon coincidence of a no-tone decision and a pitch decision. - View Dependent Claims (9, 10, 11)
-
Specification