Noise compensation in speech recognition

US 4,905,286 A
Filed: 04/01/1987
Issued: 02/27/1990
Est. Priority Date: 04/04/1986
Status: Expired due to Fees

First Claim

Patent Images

1. Apparatus for use in sound recognition comprising:

means for deriving a plurality of input signals during recognition, each of which is representative of a signal level in a corresponding region of a frequency spectrum in which frequency components appear when sounds to be recognized occur;

means for storing a plurality of groups of values representing respective probability density functions, indicating likelihoods that input signals arise from states in finite state machine models of groups of sounds to be recognized;

means for estimating an input noise level in each of said regions of said frequency spectrum; and

means for recognizing sounds from the input signals, employing respective distance measures, each derived from one of said input signals and one of said probability density functions are represented by one group of said values, each distance measure representing, in a first circumstance, a likelihood, and, in a second circumstance, a cumulative likelihood which is cumulative from minus infinity to an upper limit, of obtaining the input signal from which that distance measure is derived from the probability function from which it is also derived,the first circumstance arising when the input signal from which the distance measure is derived is above a predetermined level equal to said upper limit, and set substantially at the estimated noise level in said region corresponding to that input signal, andthe second circumstance arising when the input signal from which the distance measure is derived is below said predetermined level.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In speech recognition it is advantageous to take account of noise levels both in recognition and training. In both processes signals reaching a microphone are digitized and passed through a filter bank to be separated into frequency channels. In training, a noise estimator and a masker are used with a recognizer to prepare and store probability density functions (p.d.f.s) for each channel partially defining Markov models of words to be recognized. The p.d.f.s are derived only from input signals above noise levels but derivation is such that the whole of each p.d.f. is represented. In recognition, "distance" measurements on which recognition is based are derived for each channel. If the signal in a channel is above noise then the distance is determined, by the recognizer, from the negative logarithm of the p.d.f. but if a channel signal is below noise then the distance is determined from the negative logarithm of the cumulative distance of the p.d.f. to the noise level.

59 Citations

View as Search Results

20 Claims

1. Apparatus for use in sound recognition comprising:
- means for deriving a plurality of input signals during recognition, each of which is representative of a signal level in a corresponding region of a frequency spectrum in which frequency components appear when sounds to be recognized occur;
  
  means for storing a plurality of groups of values representing respective probability density functions, indicating likelihoods that input signals arise from states in finite state machine models of groups of sounds to be recognized;
  
  means for estimating an input noise level in each of said regions of said frequency spectrum; and
  
  means for recognizing sounds from the input signals, employing respective distance measures, each derived from one of said input signals and one of said probability density functions are represented by one group of said values, each distance measure representing, in a first circumstance, a likelihood, and, in a second circumstance, a cumulative likelihood which is cumulative from minus infinity to an upper limit, of obtaining the input signal from which that distance measure is derived from the probability function from which it is also derived,the first circumstance arising when the input signal from which the distance measure is derived is above a predetermined level equal to said upper limit, and set substantially at the estimated noise level in said region corresponding to that input signal, andthe second circumstance arising when the input signal from which the distance measure is derived is below said predetermined level.
- View Dependent Claims (2, 3, 4, 5)
- - 2. Apparatus according to claim 1 wherein the means for deriving input signals comprises a bank of filters, and said regions are channels corresponding to the filters.
  - 3. Apparatus according to claim 1 wherein the means for recognizing sounds comprises means for deriving masked input signals during recognition by representing each said input signal representative of one of said signal levels which is below said predetermined level with a masking level which is representative of a noise level in said region corresponding to said each input signal, and the means for recognizing sounds employs the masked input signals, when the second circumstance arises, to derive the said distance measure representing the cumulative likelihood.
  - 4. Apparatus according to claim 1, wherein the means for deriving input signals include a microphone and the means for estimating noise level is connected to receive signals derived from an output of said microphone, and includes means for differentiating between noise only and noise plus sounds to be recognized.
  - 5. Apparatus according to claim 1 wherein the means for recognizing sounds includes means for deriving each said distance measure from ##EQU5## when said first circumstance occurs and from -Ln(erf((A-m)/s))), where ##EQU6## when said second circumstance occurs, where A is the noise level in said region corresponding to the input signal from which that distance measure would be derived in said first circumstance, f is the input level in said region due to sounds to be recognized, m and s² are a mean and a variance, which form one group of said values and represent a probability density function (p.d.f.) from which that distance measure is derived, and ##EQU7## where N(x,0, 1) corresponds to a distributed p.d.f. with independent variable x, mean equal to zero and variance equal to one.

6. A method for use in sound recognition comprising the steps of:
- deriving a plurality of input signals during recognition, each of which is representative of a signal level in a corresponding region in a frequency spectrum, said frequency spectrum being one in which frequency components appear when sounds to be recognized occur;
  
  storing a plurality of groups of values representing respective probability density functions indicating likelihoods that input signals arise from states in finite state machine models of groups of sounds to be recognized;
  
  estimating an input noise level in each of said regions of said frequency spectrum; and
  
  recognizing sounds from the input signals;
  
  employing respective distance measures, each derived from one of said input signals and one of the said probability density functions as represented by one group of the said values, each distance measure representing, in a first circumstance, a likelihood, and, in a second circumstance, a cumulative likelihood which is cumulative from minus infinity to an upper limit, of obtaining the input signal from which that distance measure is derived from the probability function from which it is also derived,the first circumstance arising when the input signal from which the distance measure is derived is above a predetermined level equal to said upper limit and set substantially at to the estimated noise level in said region corresponding to that input signal; and
  
  the second circumstance arising when the input signal from which the distance measure is derived is below said predetermined level.
- View Dependent Claims (7, 8)
- - 7. A method according to claim 6 wherein the groups of sounds are words and the regions are channels defined by filtering.
  - 8. A method according to claim 6 wherein each said distance measure is derived from ##EQU8## when said first circumstance occurs and from -Ln(erf((A-m)/s)) ##EQU9## when said second circumstance occurs;
    - where A is the noise level in said region of the frequency spectrum corresponding to the input signal from which that distance measure is derived, f is the input level in said region due to sounds to be recognized, and m and s² are a mean and a variance respectively which form one group of said values and represent the probability density function p.d.f. from which that distance measure is derived where ##EQU10## and N(x, 0,
      
      1) corresponds to a normally distributed p.d.f. with independent variable x, mean equal to zero and variance equal to one.

9. A method of training a sound recognition system comprising:
- deriving a plurality of groups of input signals from repetitions of nominally a same sound, each said group being representative of signal levels in a corresponding region in a frequency spectrum in which frequency components appear when sounds are to be recognized;
  
  estimating noise levels in each of said regions of the frequency spectrum;
  
  selecting only those of said input signals in each of said groups of input signals which represent signal levels above the estimated noise level in said corresponding regions; and
  
  deriving, from the selected input signals obtained from input signals above noise levels only, a plurality of groups of values representing respective substantially whole probability density functions, the probability density functions indicating likelihoods that input signals arise from states in finite state machine models for a vocabulary of groups of sounds to be recognized.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 10. A method according to claim 9 whereinsaid groups of said input signals are derived for each sound in a vocabulary of sounds,said estimating noise levels step includes the step of finding a highest estimated noise level for each region for all repetitions of all sounds in said vocabulary, andsaid selecting step includes, for each region, choosing said estimated noise level, as said highest noise level estimated for that region.
  - 11. A method according to claim 10 comprising the further step of estimating each said probability density function (p.d.f.) to have a normal distribution and each group of said values comprises an estimated true mean m and an estimated variance S² of the distribution.
  - 12. A method according to claim 11 wherein for each region and each said p.d.f., m and S² are determined from ##EQU11## where B is said noise level in said each region, M is the mean of input signal levels in that region above the noise level B, F is a proportion of input signals below the noise level B, erf (F) is as defined above, and
    
    space="preserve" listing-type="equation">Q(F)=N(erf.sup.-1 (F), 0,
    
    1),
    where ##EQU12## and N(x, 0,
    
    1) corresponds to a normally distributed p.d.f. with independent variable x, mean equal to zero and variance equal to one.
- 13. A method according to claim 11 wherein a derived mean and a derived variance are substituted for the estimated true mean and the estimated true variance in all said groups of said values derived in training from every said region in which the proportion of input signals which are below said estimated highest noise level for that region exceeds a predetermined value greater than 0.5.
- 14. A method according to claim 13 wherein the predetermined value is 0.8.
- 15. A method according to claim 13 wherein in order to derive the derived variance a standard minimum variance is added to all true variances, the standard minimum variance having a value which is small enough to ensure that the derived variances for different states differ significantly where true variances are significantly different.
- 16. A method according to claim 13 wherein in deriving said derived variance, a scaled minimum variance is added to each true variance, the scalling of the scaled minimum variance for a particular p.d.f. being derived from the number of input signal samples used to derive the said group of values for that p.d.f.
- 17. A method according to claim 13 wherein the derived mean and the derived variance have predetermined fixed values.
- 18. A method according to claim 13 wherein for every said region in which said proportion is between the predetermined value and a lower further predetermined value, a substitute mean and a substitute variance are substituted for the estimated true mean and the estimated true variance, over the range between the two predetermined values, the substitute mean and the substitute variance being taken from a smooth transition from said derived mean to a predetermined fixed mean and from said derived variance to a predetermined fixed variance, respectively, according to position in said range.
- 19. A method according to claim 10 wherein each probability density function is assumed to have a normal distribution and each group of said values comprises an estimated true mean and a modified variance which is the sum of an estimated line variance and a predetermined minimum variance sealed by a fixed value.
- 20. A method according to claim 10 wherein each probability density function is assumed to have a normal distribution and each group of said values comprises an estimated true mean and a modified variance which is the sum of an estimated true variance, calulated from a number of said input signals, and a predetermined minimum variance scaled by a value dependent on said number of input signals used to calculate the estimated true variance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Research Development Corporation
Original Assignee
National Research Development Corporation
Inventors
Holmes, John N., Sedgwick, Nigel C.
Primary Examiner(s)
NOT, DEFINED
Assistant Examiner(s)
NOT, DEFINED

Application Number

US07/032,566
Time in Patent Office

1,063 Days
Field of Search

381/46, 381/47, 381/41-45, 364/513.5
US Class Current

704/233
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

Noise compensation in speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

59 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Noise compensation in speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

59 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links