Method and apparatus for detecting speech activity using cepstrum vectors
First Claim
Patent Images
1. A method for detecting an endpoint of speech in an input signal, wherein the input signal is sampled, said method comprising the steps of:
- generating cepstrum vectors representing each spectrum of individual samples of the input signal;
generating a cepstrum vector for a steady state portion of the input signal; and
comparing the cepstrum vectors of individual samples with the cepstrum vector for the steady state portion of the input signal to identify the endpoint of speech as that portion of the input signal having a spectrum that converges to the steady state portion of the input signal.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for detecting speech activity in an input signal. The present invention includes performing begin point detection using power/zero crossing. Once the begin point has been detected, the present invention uses the cepstrum of the input signal to determine the endpoint of the sound in the signal. After both the beginning and ending of the sound are detected, the present invention uses vector quantization distortion to classify the sound as speech or noise.
-
Citations
31 Claims
-
1. A method for detecting an endpoint of speech in an input signal, wherein the input signal is sampled, said method comprising the steps of:
-
generating cepstrum vectors representing each spectrum of individual samples of the input signal; generating a cepstrum vector for a steady state portion of the input signal; and comparing the cepstrum vectors of individual samples with the cepstrum vector for the steady state portion of the input signal to identify the endpoint of speech as that portion of the input signal having a spectrum that converges to the steady state portion of the input signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for detecting speech activity in an input signal comprising the steps of:
-
detecting a beginning point of speech in the input signal; detecting an ending point of speech in the input signal, wherein the step of detecting an ending point of speech comprises the steps of computing an average cepstrum vector for each frame to represent a steady state portion of the input signal, comparing cepstrum vectors for individual speech samples with the average cepstrum vector, including the step of determining distance of a current cepstrum vector for an individual speech sample from the average cepstrum vector to determine a variance, and identify the ending point of speech when the variance is at least at a predetermined variance indicative of whether the ending point of speech has been detected. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A method for detecting speech activity in an input signal having a beginning point and an ending point, said method comprising the steps of:
-
detecting the beginning point of speech in the input signal; detecting the ending point of speech in the input signal using cepstrum vectors, wherein the step of detecting the ending point of speech comprises the step of comparing the cepstrum vectors of individual speech samples of the input signal with a cepstrum vector for a steady state portion of the input signal to identify the ending point of speech; classifying the sound as speech or noise, such that speech recognition occurs on the input signal when the sound is classified as speech and speech recognition does not occur on the input signal when the sound is classified as noise. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A method for detecting speech activity in an input signal comprising the steps of:
-
detecting the power and zero crossings of the input signal to determine a beginning point of sound in the input signal; detecting an end point of sound in the input signal, wherein the step of detecting an end point of sound comprises the steps of generating cepstrum vectors representing each spectrum of individual samples of the input signal, generating a cepstrum vector for a steady state portion of the input signal, and comparing the cepstrum vectors of individual speech samples for each frame with the cepstrum vector representing a steady state portion of the input signal and identifying the end point of sound as the point of the input signal where the current cepstrum vector converges to the cepstrum vector representing the steady state; and comparing the current cepstral vector with a speech codebook and a noise codebook, such that the sound is classified as speech or noise according to the distortion between current cepstral vector and a speech codebook and a noise codebook.
-
-
20. A system for recognizing speech from an input signal comprising:
-
speech activity detection means for detecting speech in the input signal, wherein said speech activity detection means comprises means for detecting power and zero crossings of the input signal to determine a beginning point of sound in the input signal; means for generating cepstral vectors representing each spectrum of individual samples of the input signal; means for generating a cepstral vector for a steady state portion of the input signal; means for comparing cepstral vectors of individual samples with the cepstral vector for the steady state portion of the input signal to identify the endpoint of speech as that portion of the input signal having a spectrum that converges to the steady state portion of the input signal; and means for comparing a current cepstral vector with a speech codebook and a noise codebook, such that sound in the input signal is classified as speech or noise according to a distortion between the current cepstral vector and a speech codebook and a noise codebook, wherein if the sound is classified as speech then the current cepstral vector is output as an output speech signal; and a recognition engine for receiving the output speech signal and recognizing the speech, such that at least one recognized word is generated.
-
-
21. A method of detecting speech activity in a data input stream comprising the steps of:
-
(a) generating a set of spectral representation vectors to represent the data input stream, wherein each spectral representation vector of the set of spectral representation vectors represents a predetermined portion of the data input stream; (b) generating a steady state spectral representation vector indicative of the state of the data input stream at a first predetermined portion of the data input stream; (c) comparing a spectral representation vector corresponding to the first predetermined portion of the data input stream to the steady state spectral representation vector; and (d) determining a first end point of speech activity when the set of spectral representation vectors converges toward the steady state spectral representation vector. - View Dependent Claims (22, 23, 24, 25, 26)
-
-
27. An apparatus for detecting speech activity in a data input stream comprising:
-
a memory unit; an input device for receiving the data input stream; a processor coupled to the memory unit and the input device, wherein the processor generates a set of spectral representation vectors to represent the data input stream and stores the set of spectral representation vectors in the memory unit, wherein each spectral representation vector of the set of spectral representation vectors represents a predetermined portion of the data input stream, wherein the processor also generates a steady state spectral representation vector indicative of the state of the data input stream at a first predetermined portion of the data input stream and compares a spectral representation vector corresponding to the first predetermined portion of the data input stream to the steady state spectral representation vector, and determines a first end point of speech activity when the set of spectral representation vectors converges toward the steady state spectral representation vector. - View Dependent Claims (28, 29, 30, 31)
-
Specification