Method and apparatus for detecting end points of speech activity
First Claim
1. A method of detecting speech activity in a data input stream comprising the steps of:
- (a) generating a set of spectral representation vectors to represent the data input stream, wherein each spectral representation vector of the set of spectral representation vectors represents a predetermined portion of the data input stream;
(b) generating a steady state spectral representation vector indicative of the state of the data input stream at a first predetermined portion of the data input stream;
(c) comparing a spectral representation vector corresponding to the first predetermined portion of the data input stream to the steady state spectral representation vector;
(d) determining a first end point of speech activity when the set of spectral representation vectors diverges from the steady state spectral representation vector; and
(e) determining a second end point of speech activity when a predetermined number of spectral representation vectors of the set of spectral representation vectors are within a predetermined distance of the steady state spectral representation vector for a continuous predetermined period of time.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for detecting end points of speech activity in an input signal using spectral representation vectors performs beginning point detection using spectral representation vectors for the spectrum of each sample of the input signal and a spectral representation vector for the steady state portion of the input signal. The beginning point of speech is detected when the spectrum diverges from the steady state portion of the input signal. Once the beginning point has been detected, the spectral representation vectors of the input signal are used to determine the ending point of the sound in the signal. The ending point of speech is detected when the spectrum converges towards the steady state portion of the input signal. After both the beginning and ending of the sound are detected, vector quantization distortion can be used to classify the sound as speech or noise.
131 Citations
28 Claims
-
1. A method of detecting speech activity in a data input stream comprising the steps of:
-
(a) generating a set of spectral representation vectors to represent the data input stream, wherein each spectral representation vector of the set of spectral representation vectors represents a predetermined portion of the data input stream; (b) generating a steady state spectral representation vector indicative of the state of the data input stream at a first predetermined portion of the data input stream; (c) comparing a spectral representation vector corresponding to the first predetermined portion of the data input stream to the steady state spectral representation vector; (d) determining a first end point of speech activity when the set of spectral representation vectors diverges from the steady state spectral representation vector; and (e) determining a second end point of speech activity when a predetermined number of spectral representation vectors of the set of spectral representation vectors are within a predetermined distance of the steady state spectral representation vector for a continuous predetermined period of time.
-
-
2. A method of detecting speech activity in a data input stream comprising the steps of:
-
(a) generating a set of autocorrelation vectors to represent the data input stream, wherein each autocorrelation vector of the set of autocorrelation vectors represents a predetermined portion of the data input stream; (b) generating a steady state autocorrelation vector indicative of the state of the data input stream at a first predetermined portion of the data input stream; (c) comparing an autocorrelation vector corresponding to the first predetermined portion of the data input stream to the steady state autocorrelation vector; and (d) determining a first end point of speech activity when the set of autocorrelation vectors diverges from the steady state autocorrelation vector. - View Dependent Claims (3, 4, 5, 6)
-
-
7. A method of detecting speech activity in a data input stream comprising the steps of:
-
(a) generating a set of Fourier Transform vectors to represent the data input stream, wherein each Fourier Transform vector of the set of Fourier Transform vectors represents a predetermined portion of the data input stream; (b) generating a steady state Fourier Transform vector indicative of the state of the data input stream at a first predetermined portion of the data input stream; (c) comparing a Fourier Transform vector corresponding to the first predetermined portion of the data input stream to the steady state Fourier Transform vector; and (d) determining a first end point of speech activity when the set of Fourier Transform vectors diverges from the steady state Fourier Transform vector. - View Dependent Claims (8, 9, 10, 11)
-
-
12. An apparatus for detecting speech activity in a data input stream comprising:
-
a memory unit; an input device for receiving the data input stream; and a processor coupled to the memory unit and the input device, wherein the processor generates a set of spectral representation vectors to represent the data input stream and stores the set of spectral representation vectors in the memory unit, wherein each spectral representation vector of the set of spectral representation vectors represents a predetermined portion of the data input stream, wherein the processor also generates a steady state spectral representation vector indicative of the state of the data input stream at a first predetermined portion of the data input stream and compares a spectral representation vector corresponding to the first predetermined portion of the data input stream to the steady state spectral representation vector, determines a first end point of speech activity when the set of spectral representation vectors diverges from the steady state spectral representation vector, and determines a second end point of speech activity when a predetermined number of spectral representation vectors of the set of spectral representation vectors are within a predetermined distance of the steady state spectral representation vector for a continuous predetermined period of time.
-
-
13. An apparatus for detecting speech activity in a data input stream comprising:
-
a memory unit; an input device for receiving the data input stream; a processor coupled to the memory unit and the input device, wherein the processor generates a set of autocorrelation vectors to represent the data input stream and stores the set of autocorrelation vectors in the memory unit, wherein each autocorrelation vector of the set of autocorrelation vectors represents a predetermined portion of the data input stream, wherein the processor also generates a steady state autocorrelation vector indicative of the state of the data input stream at a first predetermined portion of the data input stream and compares an autocorrelation vector corresponding to the first predetermined portion of the data input stream to the steady state autocorrelation vector, and determines a first end point of speech activity when the set of autocorrelation vectors diverges from the steady state autocorrelation vector. - View Dependent Claims (14, 15, 16)
-
-
17. An apparatus for detecting speech activity in a data input stream comprising:
-
a memory unit; an input device for receiving the data input stream; a processor coupled to the memory unit and the input device, wherein the processor generates a set of Fourier Transform vectors to represent the data input stream and stores the set of Fourier Transform vectors in the memory unit, wherein each Fourier Transform vector of the set of Fourier Transform vectors represents a predetermined portion of the data input stream, wherein the processor also generates a steady state Fourier Transform vector indicative of the state of the data input stream at a first predetermined portion of the data input stream and compares a Fourier Transform vector corresponding to the first predetermined portion of the data input stream to the steady state Fourier Transform vector, and determines a first end point of speech activity when the set of Fourier Transform vectors diverges from the steady state Fourier Transform vector. - View Dependent Claims (18, 19, 20)
-
-
21. A method of detecting speech activity in a data input stream comprising the steps of:
-
(a) generating a set of spectral representation vectors to represent a plurality of portions of the data input stream; (b) generating a steady state spectral representation vector indicative of the state of the data input stream at a first portion of the data input stream, wherein the first portion is one of the plurality of portions; (c) comparing a first spectral representation vector representing the first portion of the data input stream to the steady state spectral representation vector; and (d) determining a first end point of speech activity when the set of spectral representation vectors diverges from the steady state spectral representation vector. - View Dependent Claims (22, 23, 24)
-
-
25. An apparatus for detecting speech activity in a data input stream comprising:
-
a memory unit an input device for receiving the data input stream; and a processor coupled to the memory unit and the input device, wherein the processor generates a set of spectral representation vectors to represent a plurality of portions of the data input stream and stores the set of spectral representation vectors in the memory unit, wherein the processor also generates a steady state spectral representation vector indicative of the state of the data input stream at a first portion of the data input stream, wherein the first portion is one of the plurality of portions, wherein the processor also compares a first spectral representation vector representing the first portion of the data input stream to the steady state spectral representation vector and determines a first end point of speech activity when the set of spectral representation vectors diverges from the steady state spectral representation vector. - View Dependent Claims (26, 27, 28)
-
Specification