Automatic gain control in a speech recognition system
First Claim
Patent Images
1. A speech recognition preprocessor, comprising:
- an analyzer for receiving a digital speech signal generating therefrom a sequence of frames, each frame having a plurality of samples from said digital speech signal;
means coupled to said analyzer means for tracking an upper energy envelope, an average energy envelope, and a lower energy envelope by a plurality of energy tracks in one or more consecutive frames of said digital speech signal, wherein said energy tracks are based on a high biased running mean, a low biased running mean and a nominally unbiased running mean; and
means coupled to said tracking means for computing a normalized energy value and providing said normalized energy value to a speech recognition system.
2 Assignments
0 Petitions
Accused Products
Abstract
Energy normalization in a speech recognition system is achieved by adaptively tracking the high, mid, and low energy envelopes, wherein the adaptive high energy tracking value adapts with weighting enhanced for high energies, and the adaptive low energy tracking value adapts with weighting enhanced for low energies. A tracking method is also provided for discriminating waveform segments as being one of “speech” or “silence”, and a measure of the signal to noise ratio and absolute noise floor are used as feedback means to achieve optimal speech recognition accuracy.
-
Citations
21 Claims
-
1. A speech recognition preprocessor, comprising:
-
an analyzer for receiving a digital speech signal generating therefrom a sequence of frames, each frame having a plurality of samples from said digital speech signal;
means coupled to said analyzer means for tracking an upper energy envelope, an average energy envelope, and a lower energy envelope by a plurality of energy tracks in one or more consecutive frames of said digital speech signal, wherein said energy tracks are based on a high biased running mean, a low biased running mean and a nominally unbiased running mean; and
means coupled to said tracking means for computing a normalized energy value and providing said normalized energy value to a speech recognition system. - View Dependent Claims (2, 3, 4)
-
-
5. A method of normalizing energy in a voice signal, said method comprising the steps of:
-
dividing the voice signal into a plurality of consecutive time intervals;
calculating a high energy track for tracking the upper energy envelope of said voice signal;
calculating a low energy track for tracking the lower energy envelope of said voice signal;
calculating a mid energy track for tracking the average energy envelope of said voice signal, wherein said high energy track, said low energy track and said mid energy track are based on a high biased running mean, a low biased running mean, and a nominally unbiased running mean, respectively; and
calculating a value of normalized energy from said high energy track to be provided to a speech recognition system. - View Dependent Claims (6, 7, 8, 9, 10, 11)
-
-
12. A method of normalizing energy value in a PCM voice signal, said voice signal comprising a plurality of frames, each of said plurality of frames defining a fixed interval of said voice signal, said method comprising the steps of:
-
constructing an observation window whose width defines a current observation interval of said voice signal, said current observation interval encompassing a plurality of digital samples of said voice signal;
shifting said observation window discrete shift increments along said PCM voice signal;
at each discrete shift increment of the observation window;
computing a feature vector from said plurality of encompassed digital samples of said voice signal contained within said observation interval;
using said feature vector to compute a high biased running mean, a low biased running mean, and a nominally unbiased running mean;
determining whether said current interval is one of a speech interval and a silence interval;
based on the determination step, when said interval is a speech interval;
computing a gated and smoothed high biased running mean from said high biased running mean;
holding a gated and smoothed low bias running mean constant to a value computed at the most recent silence interval;
computing an energy normalization value from said gated and smoothed high biased running; and
outputting said energy normalization value to a speech recognition system holding a gated and smoothed low bias running mean constant to a value computed at the most recent silence interval;
computing an energy normalization value from said gated and smoothed high biased running mean; and
outputting said energy normalization value to a speech recognition system. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
computing a gated and smoothed low biased running mean from said low biased running means;
holding a gated and smoothed high biased running mean constant to a value computed at the most recent speech interval;
computing an energy normalization value from said gated and smoothed high biased running mean; and
outputting said energy normalization value to a speech recognition system.
-
-
14. The method according to claim 12, wherein the determination as to whether said current interval is one of a speech interval and a silence interval is made by comparing the nominally unbiased running mean with the low biased running mean.
-
15. The method according to claim 13, wherein the current interval is a silence interval whenever the a unbiased running mean is within a pre-defined threshold of the low biased running mean.
-
16. The method according to claim 13, wherein the current interval is determined to be a speech interval whenever the value of the unbiased running mean exceeds the value of the low biased running mean by some predefined threshold.
-
17. The method according to claim 12, wherein said feature vector further comprises the RMS energy and the zeroeth cepstral coefficient from said plurality of encompassed digital samples.
-
18. The method according to claim 12, wherein said high biased running mean computed at the current shift increment, t, is a weighted running mean computed as
-
19. The method according to claim 17, wherein the weighting coefficients w1 and (1−
- w1) are adjustable at each shift increment in response to the relative magnitude of the high biased running mean computed at each shift increment and the currently observed value at the current increment.
-
20. The method according to claim 12, wherein said low biased running mean computed at the current shift increment, t, is a weighted running mean computed as
-
21. The method according to claim 19, wherein the weighting coefficients w1 and (1−
- w1) are adjustable at each shift increment in response to the relative magnitude of the low biased running mean computed at each shift increment and the currently observed value at the current increment.
Specification