Speech detection in presence of noise by determining variance over time of frequency band limited energy
First Claim
Patent Images
1. A device for detecting speech in an input signal comprising:
- first determining means for determining a plurality of values representative of a plurality of frequency band limited energy within the signal, wherein the signal is sampled at a predetermined sampling rate in a single frequency band over a first plurality of frames, wherein each frame comprises a plurality of samples;
second determining means for receiving the plurality of values from said first determining means, and determining a variance of the frequency band limited energy of the signal the single frequency band over a second plurality of frames; and
third determining means for determining beginning and ending points of speech within the signal using the variance of the frequency band limited energy, wherein the third determining means comprises;
fourth determining means for determining a beginning of speech as occurring when the variance of the frequency band limited energy exceeds an upper threshold level and for determining an ending of speech as occurring when the variance of the frequency band limited energy falls below a lower threshold level, the upper threshold level being greater than the lower threshold level.
2 Assignments
0 Petitions
Accused Products
Abstract
The device detects the beginning and ending portions of speech contained within an input signal based on the variance of frequency band limited energy within the signal. The use of the variance allows detection which is relatively independent of an absolute signal-to-noise ratio with the signal, and allows accurate detection within a wide variety of backgrounds such as music, motor noise, and background noise, such as other speakers. The device can be easily implemented using off-the-shelf hardware along with a high-speed special purpose digital signal processor integrated circuit.
-
Citations
17 Claims
-
1. A device for detecting speech in an input signal comprising:
-
first determining means for determining a plurality of values representative of a plurality of frequency band limited energy within the signal, wherein the signal is sampled at a predetermined sampling rate in a single frequency band over a first plurality of frames, wherein each frame comprises a plurality of samples; second determining means for receiving the plurality of values from said first determining means, and determining a variance of the frequency band limited energy of the signal the single frequency band over a second plurality of frames; and third determining means for determining beginning and ending points of speech within the signal using the variance of the frequency band limited energy, wherein the third determining means comprises; fourth determining means for determining a beginning of speech as occurring when the variance of the frequency band limited energy exceeds an upper threshold level and for determining an ending of speech as occurring when the variance of the frequency band limited energy falls below a lower threshold level, the upper threshold level being greater than the lower threshold level. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. In a device for recognizing speech within an input signal, with the device having means for receiving the signal, means for determining beginning and ending points of speech within the signal, and means for determining content of speech within the signal between the beginning and ending points, an improvement to the means for determining the beginning and ending points of the speech comprising:
-
first determining means for determining a plurality of values representative of a plurality of frequency band limited energy within the signal, wherein the signal is sampled at a predetermined sampling rate in a single frequency band over a first plurality of frames, wherein each frame comprises a plurality of samples; second determining means for receiving the plurality of values from said first determining means, and determining a variance of the frequency band limited energy of the signal in the single frequency band over a second plurality of frames; and third determining means for determining beginning and ending points of speech within the signal based on the variance of the frequency band limited energy, wherein the third determining means comprises; fourth determining means for determining a beginning of speech as occurring when the variance of the frequency band limited energy exceeds an upper threshold level and for determining an ending of speech as occurring when the variance of the frequency band limited energy falls below a lower threshold level, the upper threshold level being greater than the lower threshold level.
-
-
13. A device for the detection of speech in an input signal, comprising:
-
first determining means for determining a variance of a frequency band limited energy of the signal; and speech interval decision means for deciding start and end points of speech within the signal based on said variance, wherein said speech interval decision means comprises; second determining means for determining a beginning of speech as occurring when the variance of the frequency band limited energy exceeds an upper threshold level and for determining an ending of speech as occurring when the variance of the frequency band limited energy falls below a lower threshold level, the upper threshold level being greater than the lower threshold level, wherein the first determining means for determining a variance comprises; third means for providing a plurality of determined values representative of a plurality of frequency band limited energy at a predetermined sampling rate in a single frequency band over a first plurality of frames, wherein each frame comprises a plurality of samples; and fourth means for calculating the variance from the plurality of determined values provided from the third means in the single frequency band over a second plurality of frames. - View Dependent Claims (14, 15, 16)
-
-
17. A device for detecting speech in an input signal comprising:
-
first determining means for determining a plurality of values representative of a plurality of frequency band limited energy within the signal, wherein the signal is sampled at a predetermined sampling rate in a single frequency band over a first plurality of frames, wherein each frame comprises a plurality of samples, the first determining means using a frequency band limiter; second determining means for receiving the plurality of values from said first determining means, and determining a variance of the frequency band limited energy of the signal in the single frequency band over a second plurality of frames; and third determining means for determining beginning and ending points of speech within the signal using the variance of the frequency band limited energy.
-
Specification