Targeted speech
First Claim
1. A process that improves speech detection by processing a limited frequency band comprising:
- encoding a limited frequency band of an input into a signal by varying the amplitude of pulse width modulated signal that is limited to a plurality of predefined values;
separating the signal into frequency bins in which each bin identifies an amplitude and a phase;
estimating a signal strength of a background voice segment in time;
estimating a distribution of noise to an average acoustic power of one or a plurality of frequency bins;
comparing a signal-to-noise ratio of each frequency bin to the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power; and
identifying a speech segment from the noise that surrounds it based on the comparison.
8 Assignments
0 Petitions
Accused Products
Abstract
A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a digital converter that converts a time-varying input signal into a digital-domain signal. A window function passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a criterion based on an output of the background voice detector and an output of the noise estimator.
-
Citations
21 Claims
-
1. A process that improves speech detection by processing a limited frequency band comprising:
-
encoding a limited frequency band of an input into a signal by varying the amplitude of pulse width modulated signal that is limited to a plurality of predefined values; separating the signal into frequency bins in which each bin identifies an amplitude and a phase; estimating a signal strength of a background voice segment in time; estimating a distribution of noise to an average acoustic power of one or a plurality of frequency bins; comparing a signal-to-noise ratio of each frequency bin to the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power; and identifying a speech segment from the noise that surrounds it based on the comparison. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A process that improves speech processing by processing a limited frequency band comprising:
-
converting a limited frequency band of a continuously varying input into a digital-domain signal; converting the digital domain signal into a frequency-domain signal; estimating the signal strength of a smoothed background voice segment in time; estimating the noise-variance of a segment of the digital domain signal; comparing a potential speech segment to the estimated signal strength of the smoothed background voice segment and the estimated noise variance; and identifying a speech segment from the noise that surrounds it based on the comparison. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system that detects a speech segment that includes an unvoiced, a fully voiced, or a mixed voice content comprising:
-
a digital converter that converts a time-varying input signal into a digital-domain signal; a window function configured to pass signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter; a frequency converter that converts the signals passing within the programmed aural frequency range into a plurality of frequency bins; a background voice detector configured estimate a strength of a background speech segment relative to noise of selected portions of an aural spectrum; a noise estimator configured to estimate a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins; and a voice detector configured to compare the strength of a desired speech segment to a criterion based on an output of the background voice detector and an output of the noise estimator. - View Dependent Claims (20, 21)
-
Specification