Enhanced likelihood computation using regression in a speech recognition system

US 6,493,667 B1
Filed: 08/05/1999
Issued: 12/10/2002
Est. Priority Date: 08/05/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for use with a speech recognition system in processing a current frame of a speech signal, the method comprising the steps of:

computing a likelihood value for the current frame of the speech signal;

computing a likelihood value for at least one neighboring frame, the likelihood value of the neighboring frame including a likelihood value for at least one frame preceding and a likelihood value for at least one frame succeeding the current frame of the speech signal; and

combining the likelihood values for the current and neighboring frames to form a final likelihood value for assignment in association with the current frame of the speech signal, wherein at least one of the likelihood values is assigned a corresponding weight before being combined.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In order to achieve low error rates in a speech recognition system, for example, in a system employing rank-based decoding, we discriminate the most confusable incorrect leaves from the correct leaf by lowering their ranks. That is, we increase the likelihood of the correct leaf of a frame, while decreasing the likelihoods of the confusable leaves. In order to do this, we use the auxiliary information from the prediction of the neighboring frames to augment the likelihood computation of the current frame. We then use the residual errors in the predictions of neighboring frames to discriminate between the correct (best) and incorrect leaves of a given frame. We present a new methodology that incorporates prediction error likelihoods into the overall likelihood computation to improve the rank position of the correct leaf.

Citations

15 Claims

1. A method for use with a speech recognition system in processing a current frame of a speech signal, the method comprising the steps of:
- computing a likelihood value for the current frame of the speech signal;
  
  computing a likelihood value for at least one neighboring frame, the likelihood value of the neighboring frame including a likelihood value for at least one frame preceding and a likelihood value for at least one frame succeeding the current frame of the speech signal; and
  
  combining the likelihood values for the current and neighboring frames to form a final likelihood value for assignment in association with the current frame of the speech signal, wherein at least one of the likelihood values is assigned a corresponding weight before being combined.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the likelihood value for the neighboring frame is a function of a prediction error computed for the neighboring frame.
  - 3. The method of claim 2, wherein the prediction error is computed from a regression coefficient associated with the neighboring frame.
  - 4. The method of claim 1, wherein computation of the neighboring frame likelihood value includes computing a likelihood value for at least one frame preceding the current frame and a likelihood value for at least one frame succeeding the current frame.

5. A method for use with a speech recognition system in modeling one or more frames of a speech signal, the method comprising the steps of:
- tagging feature vectors associated with each frame received in a training phase with best aligning gaussian distributions;
  
  estimating forward and backward regression coefficients for the gaussian distributions for each frame;
  
  computing residual error vectors from the regression coefficients for each frame;
  
  modeling prediction errors to form a set of gaussian models for speech associated with each frame;
  
  computing one or more sets of likelihood values for each frame of a speech signal received during a recognition phase, the sets of likelihood values being based, at least in part, on the set of gaussian models; and
  
  generating a final likelihood score for each frame of the speech signal, the final likelihood score being a weighted combination of each likelihood value in a set.
- View Dependent Claims (6, 7)
- - 6. The method of claim 5, wherein each regression coefficient is an m×
    - n matrix where m is a dimensionality of a feature vector being predicted and n is a dimensionality of a feature vector used to predict the feature vector being predicted plus a constant term.
  - 7. A The method of claim 5, wherein the backward and forward regression coefficients are respectively represented as:

8. Apparatus for use with a speech recognition system in processing a current frame of a speech signal, the apparatus comprising:
- at least one processor operable to compute a likelihood value for the current frame of the speech signal, to compute a likelihood value for at least one neighboring frame, the likelihood value of the neighboring frame including a likelihood value for at least one frame preceding and a likelihood value for at least one frame succeeding the current frame of the speech signal, and to combine the likelihood values for the current and neighboring frames to form a final likelihood value for assignment in association with the current frame of the speech signal, wherein at least one of the likelihood values is assigned a corresponding weight before being combined.
- View Dependent Claims (9, 10, 11)
- - 9. The apparatus of claim 8, wherein the likelihood value for the neighboring frame is a function of a prediction error computed for the neighboring frame.
  - 10. The apparatus of claim 9, wherein the prediction error is computed from a regression coefficient associated with the neighboring frame.
  - 11. The apparatus of claim 8, wherein computation of the neighboring frame likelihood value includes computing a likelihood value for at least one frame preceding the current frame and a likelihood value for at least one frame succeeding the current frame.

12. Apparatus for use with a speech recognition system in modeling one or more frames of a speech signal, the apparatus comprising:
- at least one processor operable to tag feature vectors associated with each frame received in a training phase with best aligning gaussian distributions, to estimate forward and backward regression coefficients for the gaussian distributions for each frame, to compute residual error vectors from the regression coefficients for each frame, to model prediction errors to form a set of gaussian models for speech associated with each frame, to compute one or more sets of likelihood values for each frame of a speech signal received during a recognition phase, the sets of likelihood values being based, at least in part, on the set of gaussian models, and to generate a final likelihood score for each frame of the speech signal, the final likelihood score being a weighted combination of each likelihood value in a set.
- View Dependent Claims (13, 14)
- - 13. The apparatus of claim 12, wherein each regression coefficient is an m×
    - n matrix where m is a dimensionality of a feature vector being predicted and n is a dimensionality of a feature vector used to predict the feature vector being predicted plus a constant term.
  - 14. The apparatus of claim 12, wherein the backward and forward regression coefficients are respectively represented as:

15. An article of manufacture for use with a speech recognition system in processing a current frame of a speech signal, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
- computing a likelihood value for the current frame of the speech signal;
  
  computing a likelihood value for at least one neighboring frame, the likelihood value of the neighboring frame including a likelihood value for at least one frame preceding and a likelihood value for at least one frame succeeding the current frame of the speech signal; and
  
  combining the likelihood values for the current and neighboring frames to form a final likelihood value for assignment in association with the current frame of the speech signal, wherein at least one of the likelihood values is assigned a corresponding weight before being combined.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Ramabhadran, Bhuvana, de Souza, Peter V., Gao, Yuqing, Picheny, Michael
Primary Examiner(s)
Chawan, Vijay
Assistant Examiner(s)
Storm, Donald L.

Application Number

US09/368,669
Time in Patent Office

1,223 Days
Field of Search

704/240, 704/256, 704/219, 704/236
US Class Current

704/240
CPC Class Codes

G10L 15/144 Training of HMMs

G10L 2015/085 Methods for reducing search...

Enhanced likelihood computation using regression in a speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Enhanced likelihood computation using regression in a speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links