Speech detection and recognition apparatus for use with background noise of varying levels
First Claim
1. Apparatus for detecting whether a portion of an audio signal generated over successive time periods contains speech to be recognized, said apparatus comprising:
- speech detection means for comparing the amplitude of the audio signal during successive time periods with one or more amplitude thresholds, and for generating, in response to said comparisons, an indication of whether or not a given portion of said audio signal contains speech to be recognized;
means for deriving a background amplitude level from the amplitude of said audio signal for one or more time periods in which the signal does not contain speech to be recognized, which level indicates the amplitude of the audio signal when it does not represent speech to be recognized;
means for deriving a measure of the spread of the distribution of the background amplitude level; and
means for altering, for purposes of the comparisons of the speech detection means, the relative magnitude of the audio signal amplitudes and the amplitude thresholds as a function of the background amplitude level and spread.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech detection system compares the amplitude of an audio signal during successive time periods with speech detection thresholds, and generates an indication of whether the signal contains speech. It derives a background amplitude level from portions of the signal which it indicates do not contain speech, and improves its speech detection by altering the amplitude of the audio signal relative to the speech detection thresholds as a function of this background level. Preferably the background amplitude level is a moving average, which is repeatedly recalculated and repeatedly used to alter the relative amplitude of the audio signal and the detection thresholds. The apparatus uses a measure of the variability of the background amplitude to improve its speech detection. It generates start-of-speech and end-of-speech indications when the amplitude crosses respective thresholds for specified numbers of frames. The background amplitude level is calculated from frames which precede the start-of-speech indication by a predetermined amount and which follow the end-of-speech indication. The invention also provides a speech recognition system which compares the amplitudes an audio signal against the amplitudes of acoustic models of vocabulary words to determine which vocabulary words correspond to the signal. The system compensates for background noise by using the background amplitude level, described above, to alter the audio signal amplitudes relative to the acoustic model amplitudes.
-
Citations
15 Claims
-
1. Apparatus for detecting whether a portion of an audio signal generated over successive time periods contains speech to be recognized, said apparatus comprising:
-
speech detection means for comparing the amplitude of the audio signal during successive time periods with one or more amplitude thresholds, and for generating, in response to said comparisons, an indication of whether or not a given portion of said audio signal contains speech to be recognized; means for deriving a background amplitude level from the amplitude of said audio signal for one or more time periods in which the signal does not contain speech to be recognized, which level indicates the amplitude of the audio signal when it does not represent speech to be recognized; means for deriving a measure of the spread of the distribution of the background amplitude level; and means for altering, for purposes of the comparisons of the speech detection means, the relative magnitude of the audio signal amplitudes and the amplitude thresholds as a function of the background amplitude level and spread. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A speech recognition system comprising:
-
means for receiving a representation of an audio signal, including amplitude measurements of successive parts of said signal;
means for strong acoustic models, including amplitude descriptions, associated with the sounds of vocabulary words;recognition means for comparing the representation of a portion of the audio signal against the acoustic models, and for determining, as a result of those comparisons, which one or more vocabulary words most probably correspond to that representation, the comparison being based, at least in part, on the comparison of the amplitude measurements of the signal representation against the amplitude descriptions of the acoustic models; means for deriving a background amplitude description from one or more amplitude measurements taken from a portion of the signal representation which does not contain speech to be recognized, which description provides a model of said one or more amplitude measurements; and normalization means for altering the magnitude of the amplitude measurements from the signal representation relative to the magnitude of the amplitude descriptions from the acoustic models as a function of the background amplitude description. - View Dependent Claims (14, 15)
-
Specification