Enhanced likelihood computation using regression in a speech recognition system
First Claim
1. A method for use with a speech recognition system in processing a current frame of a speech signal, the method comprising the steps of:
- computing a likelihood value for the current frame of the speech signal;
computing a likelihood value for at least one neighboring frame, the likelihood value of the neighboring frame including a likelihood value for at least one frame preceding and a likelihood value for at least one frame succeeding the current frame of the speech signal; and
combining the likelihood values for the current and neighboring frames to form a final likelihood value for assignment in association with the current frame of the speech signal, wherein at least one of the likelihood values is assigned a corresponding weight before being combined.
1 Assignment
0 Petitions
Accused Products
Abstract
In order to achieve low error rates in a speech recognition system, for example, in a system employing rank-based decoding, we discriminate the most confusable incorrect leaves from the correct leaf by lowering their ranks. That is, we increase the likelihood of the correct leaf of a frame, while decreasing the likelihoods of the confusable leaves. In order to do this, we use the auxiliary information from the prediction of the neighboring frames to augment the likelihood computation of the current frame. We then use the residual errors in the predictions of neighboring frames to discriminate between the correct (best) and incorrect leaves of a given frame. We present a new methodology that incorporates prediction error likelihoods into the overall likelihood computation to improve the rank position of the correct leaf.
-
Citations
15 Claims
-
1. A method for use with a speech recognition system in processing a current frame of a speech signal, the method comprising the steps of:
-
computing a likelihood value for the current frame of the speech signal;
computing a likelihood value for at least one neighboring frame, the likelihood value of the neighboring frame including a likelihood value for at least one frame preceding and a likelihood value for at least one frame succeeding the current frame of the speech signal; and
combining the likelihood values for the current and neighboring frames to form a final likelihood value for assignment in association with the current frame of the speech signal, wherein at least one of the likelihood values is assigned a corresponding weight before being combined. - View Dependent Claims (2, 3, 4)
-
-
5. A method for use with a speech recognition system in modeling one or more frames of a speech signal, the method comprising the steps of:
-
tagging feature vectors associated with each frame received in a training phase with best aligning gaussian distributions;
estimating forward and backward regression coefficients for the gaussian distributions for each frame;
computing residual error vectors from the regression coefficients for each frame;
modeling prediction errors to form a set of gaussian models for speech associated with each frame;
computing one or more sets of likelihood values for each frame of a speech signal received during a recognition phase, the sets of likelihood values being based, at least in part, on the set of gaussian models; and
generating a final likelihood score for each frame of the speech signal, the final likelihood score being a weighted combination of each likelihood value in a set. - View Dependent Claims (6, 7)
-
-
8. Apparatus for use with a speech recognition system in processing a current frame of a speech signal, the apparatus comprising:
at least one processor operable to compute a likelihood value for the current frame of the speech signal, to compute a likelihood value for at least one neighboring frame, the likelihood value of the neighboring frame including a likelihood value for at least one frame preceding and a likelihood value for at least one frame succeeding the current frame of the speech signal, and to combine the likelihood values for the current and neighboring frames to form a final likelihood value for assignment in association with the current frame of the speech signal, wherein at least one of the likelihood values is assigned a corresponding weight before being combined. - View Dependent Claims (9, 10, 11)
-
12. Apparatus for use with a speech recognition system in modeling one or more frames of a speech signal, the apparatus comprising:
at least one processor operable to tag feature vectors associated with each frame received in a training phase with best aligning gaussian distributions, to estimate forward and backward regression coefficients for the gaussian distributions for each frame, to compute residual error vectors from the regression coefficients for each frame, to model prediction errors to form a set of gaussian models for speech associated with each frame, to compute one or more sets of likelihood values for each frame of a speech signal received during a recognition phase, the sets of likelihood values being based, at least in part, on the set of gaussian models, and to generate a final likelihood score for each frame of the speech signal, the final likelihood score being a weighted combination of each likelihood value in a set. - View Dependent Claims (13, 14)
-
15. An article of manufacture for use with a speech recognition system in processing a current frame of a speech signal, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
computing a likelihood value for the current frame of the speech signal;
computing a likelihood value for at least one neighboring frame, the likelihood value of the neighboring frame including a likelihood value for at least one frame preceding and a likelihood value for at least one frame succeeding the current frame of the speech signal; and
combining the likelihood values for the current and neighboring frames to form a final likelihood value for assignment in association with the current frame of the speech signal, wherein at least one of the likelihood values is assigned a corresponding weight before being combined.
-
Specification