System for detecting speech with background voice estimates and noise estimates
First Claim
1. A process that improves speech detection by processing a limited frequency band comprising:
- encoding a limited frequency band of an input into a signal by varying an amplitude of a pulse width modulated signal that is limited to a plurality of predefined values;
separating the signal into frequency bins in which each frequency bin identifies an amplitude and a phase;
estimating a signal strength of a background voice segment in time;
estimating a distribution of noise to an average acoustic power of one or a plurality of frequency bins;
comparing a signal-to-noise ratio of each frequency bin to a maximum of the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power; and
identifying a speech segment from noise that surrounds the speech segment based on the comparison.
8 Assignments
0 Petitions
Accused Products
Abstract
A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a digital converter that converts a time-varying input signal into a digital-domain signal. A window function passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a criterion based on an output of the background voice detector and an output of the noise estimator.
111 Citations
17 Claims
-
1. A process that improves speech detection by processing a limited frequency band comprising:
-
encoding a limited frequency band of an input into a signal by varying an amplitude of a pulse width modulated signal that is limited to a plurality of predefined values; separating the signal into frequency bins in which each frequency bin identifies an amplitude and a phase; estimating a signal strength of a background voice segment in time; estimating a distribution of noise to an average acoustic power of one or a plurality of frequency bins; comparing a signal-to-noise ratio of each frequency bin to a maximum of the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power; and identifying a speech segment from noise that surrounds the speech segment based on the comparison. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A process that improves speech processing by processing a limited frequency band comprising:
-
converting a limited frequency band of a continuously varying input into a digital-domain signal; converting the digital-domain signal into a frequency-domain signal; estimating a signal strength of a smoothed background voice segment in time of the digital-domain signal relative to noise; estimating a noise-variance of a segment of the digital-domain signal; comparing an instant signal-to-noise ratio of the digital-domain signal to the estimated signal strength of the smoothed background voice segment in time of the digital domain signal relative to noise and the estimated noise-variance; and identifying a speech segment when the instant signal-to-noise ratio of the digital-domain signal exceeds a maximum of the estimated signal strength of the smoothed background voice segment relative to noise and the estimated noise variance. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A system that detects a speech segment that includes an unvoiced, a fully voiced, or a mixed voice content comprising:
-
a digital converter that converts a time-varying input signal into a digital-domain signal; a window function configured to pass signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter; a frequency converter that converts the signals passing within the programmed aural frequency range into a plurality of frequency bins; a background voice detector configured to estimate a strength of a background speech segment relative to noise of selected portions of an aural spectrum; a noise estimator configured to estimate a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins; and a voice detector configured to compare an instant signal-to-noise ratio of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator. - View Dependent Claims (17)
-
Specification