Signal bias removal for robust telephone speech recognition
First Claim
1. A method for reducing the effect of an unknown signal bias in an input speech signal for use by a speech recognition system, comprising:
- (1) training the speech recognition system by using the following steps;
(a) generating a set of centroids based on a training speech signal;
(b) computing an estimate of the bias for the training speech signal based on maximizing a likelihood function;
(c) subtracting the estimate of the bias from the training speech signal to obtain a tentative training speech value;
(d) repeating steps (b) and (c), wherein each subsequent computed estimate of the bias is based on the previous tentative training speech value to arrive at a reduced bias training speech signal value;
(e) recomputing the centroids based on the reduced bias training speech signal to generate a new set of centroids;
(f) repeating steps (b) to (e) to compute a processed reduced bias speech signal and to form an enhanced set of centroids;
(g) utilizing the enhanced set of centroids and the processed reduced bias speech signal as training input for a speech recognizer;
(2) testing an input speech signal to minimize the unknown bias by using the following steps;
(h) utilizing the enhanced set of centroids to compute an estimate of the bias for each utterance of the speech signal based on maximizing a likelihood function;
(i) subtracting the estimate of the bias from the speech signal to obtain a tentative speech value;
(j) repeating steps (h) and (i), wherein each subsequent computed estimate of the bias is based on the previous tentative speech value, resulting in a reduced bias speech signal value; and
(3) utilizing the reduced bias speech signal as input to a speech recognizer.
4 Assignments
0 Petitions
Accused Products
Abstract
A signal bias removal (SBR) method based on the maximum likelihood estimation of the bias for minimizing undesirable effects in speech recognition systems is described. The technique is readily applicable in various architectures including discrete (vector-quantization based), semicontinuous and continuous-density Hidden Markov Model (HMM) systems. For example, the SBR method can be integrated into a discrete density HMM and applied to telephone speech recognition where the contamination due to extraneous signal components is unknown. To enable real-time implementation, a sequential method for the estimation of the bias (SSBR) is disclosed.
-
Citations
13 Claims
-
1. A method for reducing the effect of an unknown signal bias in an input speech signal for use by a speech recognition system, comprising:
-
(1) training the speech recognition system by using the following steps; (a) generating a set of centroids based on a training speech signal; (b) computing an estimate of the bias for the training speech signal based on maximizing a likelihood function; (c) subtracting the estimate of the bias from the training speech signal to obtain a tentative training speech value; (d) repeating steps (b) and (c), wherein each subsequent computed estimate of the bias is based on the previous tentative training speech value to arrive at a reduced bias training speech signal value; (e) recomputing the centroids based on the reduced bias training speech signal to generate a new set of centroids; (f) repeating steps (b) to (e) to compute a processed reduced bias speech signal and to form an enhanced set of centroids; (g) utilizing the enhanced set of centroids and the processed reduced bias speech signal as training input for a speech recognizer; (2) testing an input speech signal to minimize the unknown bias by using the following steps; (h) utilizing the enhanced set of centroids to compute an estimate of the bias for each utterance of the speech signal based on maximizing a likelihood function; (i) subtracting the estimate of the bias from the speech signal to obtain a tentative speech value; (j) repeating steps (h) and (i), wherein each subsequent computed estimate of the bias is based on the previous tentative speech value, resulting in a reduced bias speech signal value; and (3) utilizing the reduced bias speech signal as input to a speech recognizer. - View Dependent Claims (2)
-
-
3. A method for minimizing the effect of an unknown signal bias on an input speech signal during the testing phase of a speech recognition system, comprising:
-
(a) computing an estimate of the bias for each utterance of the speech signal based on maximizing a likelihood function by initially utilizing a set of centroids generated by a training model; (b) subtracting the estimate of the bias from the input speech signal to obtain a tentative speech value; (c) repeating steps (a) and (b) a predetermined number of times, wherein each subsequent computed estimate of the bias is based on the previous tentative speech value, resulting in a reduced bias speech signal value; and (d) utilizing the reduced bias speech signal value as input to a speech recognizer. - View Dependent Claims (4)
-
-
5. A method for sequentially reducing the effect of an unknown signal bias of an input speech signal for a speech recognition system, comprising:
-
(1) training the speech recognition system by using the following steps; (a) generating a set of centroids based on a training speech signal; (b) analyzing the speech signal on a frame by frame basis or in a batch mode; (c) computing an estimate of the bias for the training speech signal based on maximizing a likelihood function; (d) subtracting the estimate of the bias from the training speech signal to obtain a tentative training speech value; (e) repeating steps (c) and (d), wherein each subsequent computed estimate of the bias is based on the previous tentative training speech value to arrive at a reduced bias training speech signal value; (f) recomputing the centroids based on the reduced biased training speech signal value to generate a new set of centroids; (g) repeating steps (c) to (f) to compute a processed reduced bias speech signal and to generate an enhanced set of centroids; (h) utilizing the enhanced set of centroids and the processed reduced bias speech signal as training input for a speech recognizer; (2) testing an input speech signal to minimize the unknown bias by using the following steps; (i) analyzing an utterance on a frame-by-frame basis; (j) computing a sequential bias estimate for each frame of the speech signal based on maximizing a likelihood function; (k) subtracting the sequential bias estimate from the input speech signal at every frame to obtain a tentative speech value; (l) repeating steps (j) and (k), wherein each subsequent computed estimate of the bias is based on the previous tentative speech value, resulting in a reduced bias speech signal value; and (3) utilizing the reduced bias speech signal as input to a speech recognizer. - View Dependent Claims (6, 9)
-
-
7. A method for sequentially reducing the effect of an unknown signal bias on an input speech signal during the testing phase of a speech recognition system, comprising:
-
(a) analyzing an utterance on a frame-by-frame basis; (b) computing a sequential bias estimate for each frame of the speech signal based on maximizing a likelihood function by utilizing a set of centroids generated by a training model; (c) subtracting the sequential bias estimate from the input speech signal at every frame to obtain a tentative speech value; (d) repeating steps (b) and (c), wherein each subsequent computed estimate of the bias is based on the previous tentative speech value, resulting in a reduced bias speech signal value; and (e) utilizing the reduced bias speech signal as input to a speech recognizer. - View Dependent Claims (8, 10)
-
-
11. A method for generating an enhanced set of centroids representative of an input speech signal for use by a speech recognition system, utilizing an initial set of centroids based on a training speech signal, comprising:
-
(a) computing an estimate of the bias for the training speech signal based on maximizing a likelihood function; (b) subtracting the estimate of the bias from the training speech signal to obtain a tentative training speech value; (c) repeating steps (a) and (b), wherein each subsequent computed estimate of the bias is based on the previous tentative training speech value to arrive at a reduced bias training speech signal value; (d) recomputing the centroids based on the reduced bias training speech signal to generate a new set of centroids; and (e) repeating steps (a) to (d) to compute a processed reduce bias speech signal to form an enhanced set of centroids. - View Dependent Claims (12, 13)
-
Specification