Noise compensation in speech recognition
First Claim
1. Apparatus for use in sound recognition comprising:
- means for deriving a plurality of input signals during recognition, each of which is representative of a signal level in a corresponding region of a frequency spectrum in which frequency components appear when sounds to be recognized occur;
means for storing a plurality of groups of values representing respective probability density functions, indicating likelihoods that input signals arise from states in finite state machine models of groups of sounds to be recognized;
means for estimating an input noise level in each of said regions of said frequency spectrum; and
means for recognizing sounds from the input signals, employing respective distance measures, each derived from one of said input signals and one of said probability density functions are represented by one group of said values, each distance measure representing, in a first circumstance, a likelihood, and, in a second circumstance, a cumulative likelihood which is cumulative from minus infinity to an upper limit, of obtaining the input signal from which that distance measure is derived from the probability function from which it is also derived,the first circumstance arising when the input signal from which the distance measure is derived is above a predetermined level equal to said upper limit, and set substantially at the estimated noise level in said region corresponding to that input signal, andthe second circumstance arising when the input signal from which the distance measure is derived is below said predetermined level.
1 Assignment
0 Petitions
Accused Products
Abstract
In speech recognition it is advantageous to take account of noise levels both in recognition and training. In both processes signals reaching a microphone are digitized and passed through a filter bank to be separated into frequency channels. In training, a noise estimator and a masker are used with a recognizer to prepare and store probability density functions (p.d.f.s) for each channel partially defining Markov models of words to be recognized. The p.d.f.s are derived only from input signals above noise levels but derivation is such that the whole of each p.d.f. is represented. In recognition, "distance" measurements on which recognition is based are derived for each channel. If the signal in a channel is above noise then the distance is determined, by the recognizer, from the negative logarithm of the p.d.f. but if a channel signal is below noise then the distance is determined from the negative logarithm of the cumulative distance of the p.d.f. to the noise level.
59 Citations
20 Claims
-
1. Apparatus for use in sound recognition comprising:
-
means for deriving a plurality of input signals during recognition, each of which is representative of a signal level in a corresponding region of a frequency spectrum in which frequency components appear when sounds to be recognized occur; means for storing a plurality of groups of values representing respective probability density functions, indicating likelihoods that input signals arise from states in finite state machine models of groups of sounds to be recognized; means for estimating an input noise level in each of said regions of said frequency spectrum; and means for recognizing sounds from the input signals, employing respective distance measures, each derived from one of said input signals and one of said probability density functions are represented by one group of said values, each distance measure representing, in a first circumstance, a likelihood, and, in a second circumstance, a cumulative likelihood which is cumulative from minus infinity to an upper limit, of obtaining the input signal from which that distance measure is derived from the probability function from which it is also derived, the first circumstance arising when the input signal from which the distance measure is derived is above a predetermined level equal to said upper limit, and set substantially at the estimated noise level in said region corresponding to that input signal, and the second circumstance arising when the input signal from which the distance measure is derived is below said predetermined level. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for use in sound recognition comprising the steps of:
-
deriving a plurality of input signals during recognition, each of which is representative of a signal level in a corresponding region in a frequency spectrum, said frequency spectrum being one in which frequency components appear when sounds to be recognized occur; storing a plurality of groups of values representing respective probability density functions indicating likelihoods that input signals arise from states in finite state machine models of groups of sounds to be recognized; estimating an input noise level in each of said regions of said frequency spectrum; and recognizing sounds from the input signals;
employing respective distance measures, each derived from one of said input signals and one of the said probability density functions as represented by one group of the said values, each distance measure representing, in a first circumstance, a likelihood, and, in a second circumstance, a cumulative likelihood which is cumulative from minus infinity to an upper limit, of obtaining the input signal from which that distance measure is derived from the probability function from which it is also derived,the first circumstance arising when the input signal from which the distance measure is derived is above a predetermined level equal to said upper limit and set substantially at to the estimated noise level in said region corresponding to that input signal; and the second circumstance arising when the input signal from which the distance measure is derived is below said predetermined level. - View Dependent Claims (7, 8)
-
-
9. A method of training a sound recognition system comprising:
-
deriving a plurality of groups of input signals from repetitions of nominally a same sound, each said group being representative of signal levels in a corresponding region in a frequency spectrum in which frequency components appear when sounds are to be recognized; estimating noise levels in each of said regions of the frequency spectrum; selecting only those of said input signals in each of said groups of input signals which represent signal levels above the estimated noise level in said corresponding regions; and deriving, from the selected input signals obtained from input signals above noise levels only, a plurality of groups of values representing respective substantially whole probability density functions, the probability density functions indicating likelihoods that input signals arise from states in finite state machine models for a vocabulary of groups of sounds to be recognized. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
13. A method according to claim 11 wherein a derived mean and a derived variance are substituted for the estimated true mean and the estimated true variance in all said groups of said values derived in training from every said region in which the proportion of input signals which are below said estimated highest noise level for that region exceeds a predetermined value greater than 0.5.
-
14. A method according to claim 13 wherein the predetermined value is 0.8.
-
15. A method according to claim 13 wherein in order to derive the derived variance a standard minimum variance is added to all true variances, the standard minimum variance having a value which is small enough to ensure that the derived variances for different states differ significantly where true variances are significantly different.
-
16. A method according to claim 13 wherein in deriving said derived variance, a scaled minimum variance is added to each true variance, the scalling of the scaled minimum variance for a particular p.d.f. being derived from the number of input signal samples used to derive the said group of values for that p.d.f.
-
17. A method according to claim 13 wherein the derived mean and the derived variance have predetermined fixed values.
-
18. A method according to claim 13 wherein for every said region in which said proportion is between the predetermined value and a lower further predetermined value, a substitute mean and a substitute variance are substituted for the estimated true mean and the estimated true variance, over the range between the two predetermined values, the substitute mean and the substitute variance being taken from a smooth transition from said derived mean to a predetermined fixed mean and from said derived variance to a predetermined fixed variance, respectively, according to position in said range.
-
19. A method according to claim 10 wherein each probability density function is assumed to have a normal distribution and each group of said values comprises an estimated true mean and a modified variance which is the sum of an estimated line variance and a predetermined minimum variance sealed by a fixed value.
-
20. A method according to claim 10 wherein each probability density function is assumed to have a normal distribution and each group of said values comprises an estimated true mean and a modified variance which is the sum of an estimated true variance, calulated from a number of said input signals, and a predetermined minimum variance scaled by a value dependent on said number of input signals used to calculate the estimated true variance.
-
Specification