Hidden conditional random field models for phonetic classification and speech recognition

US 20060085190A1
Filed: 10/15/2004
Published: 04/20/2006
Est. Priority Date: 10/15/2004
Status: Active Grant

First Claim

Patent Images

1. A method of training a hidden conditional random field model, the method comprising:

defining a set of hidden states for each of a plurality of labels;

identifying a constrained sequence of sets of hidden states, at least one set in the sequence containing fewer than all of the hidden states defined for all of the labels; and

adjusting parameters of the hidden conditional random field model to make state sequences in the constrained sequence of sets of hidden states more likely than a state sequence in an unconstrained sequence of sets of hidden states, each set in the unconstrained sequence containing all of the hidden states defined for all of the labels.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses features, at least one of which is based on a hidden state in a phonetic unit. Values for the features are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.

15 Citations

View as Search Results

24 Claims

1. A method of training a hidden conditional random field model, the method comprising:
- defining a set of hidden states for each of a plurality of labels;
  
  identifying a constrained sequence of sets of hidden states, at least one set in the sequence containing fewer than all of the hidden states defined for all of the labels; and
  
  adjusting parameters of the hidden conditional random field model to make state sequences in the constrained sequence of sets of hidden states more likely than a state sequence in an unconstrained sequence of sets of hidden states, each set in the unconstrained sequence containing all of the hidden states defined for all of the labels.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein adjusting parameters of the hidden conditional random field model comprises determining a conditional log likelihood by determining a recursion score for the constrained sequence of sets of hidden states and a second recursion score for the unconstrained sequence of sets of hidden states.
  - 3. The method of claim 2 wherein determining the recursion score comprises, at each time point in the constrained sequence of sets, determining a score for each hidden state in the set of hidden states at that time point.
  - 4. The method of claim 3 wherein determining a score for a hidden state at a time point comprises taking a sum of scores for at least two hidden states at a previous time frame.
  - 5. The method of claim 1 wherein each label has a different set of hidden states.
  - 6. The method of claim 1 wherein the labels represent phonetic units.
  - 7. The method of claim 1 wherein adjusting the parameters of the hidden conditional random field further comprises determining a gradient of a conditional log likelihood.

8. A computer-readable medium having computer-executable instructions for performing steps comprising:
- receiving a speech signal;
  
  determining values from the speech signal for features that are defined for a hidden conditional random field model, at least one of the features based on a hidden state in a phonetic unit; and
  
  using the values of the features in the hidden conditional random field model to identify at least one phonetic unit.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The computer-readable medium of claim 8 wherein determining a value for a feature comprises determining that a current hidden state matches a particular hidden state and setting the value of the feature equal to one element of an observation vector.
  - 10. The computer-readable medium of claim 8 wherein determining a value for a feature comprises determining that a current hidden state matches a particular hidden state and setting the value of the feature equal to the square of one element of an observation vector.
  - 11. The computer-readable medium of claim 8 wherein determining a value for a feature comprises determining that a current hidden state matches a particular hidden state and setting the value of the feature equal to a value of a formant in the speech signal.
  - 12. The computer-readable medium of claim 8 wherein determining a value for a feature comprises determining that a current hidden state matches a particular hidden state and setting the value of the feature equal to one if the speech signal is voiced and zero if the speech signal is unvoiced.
  - 13. The computer-readable medium of claim 12 wherein determining a value for a feature comprises determining that a current hidden state does not match a particular hidden state and setting the value of the feature equal zero.
  - 14. The computer-readable medium of claim 8 wherein the hidden conditional random field model is trained using a constrained trellis of hidden states comprising a separate set of hidden states at each of a plurality of time points, each set comprising fewer than all possible hidden states.
  - 15. The computer-readable medium of claim 14 wherein the hidden conditional random field model is trained using an unconstrained trellis.
  - 16. The computer-readable medium of claim 8 wherein the hidden conditional random field model is trained by determining a conditional log likelihood and a gradient of the conditional log likelihood.

17. A method of decoding a speech signal to identify at least one phonetic unit, the method comprising:
- identifying a first value for a feature for a first hidden state at a time point using a segment of the speech signal;
  
  identifying a second value for the feature for a second hidden state at the time point using the segment of the speech signal;
  
  using both the first value for the feature and the second value for the feature in a model to identify a phonetic unit for the segment of speech.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. The method of claim 17 wherein the first value for the feature comprises a formant value and the second value comprises a value of zero.
  - 19. The method of claim 17 wherein the first value for the feature comprises an element of an observation vector generated from the segment of speech and the second value for the feature comprises a value of zero.
  - 20. The method of claim 17 wherein the first value for the feature comprises the square of an element of an observation vector generated from the segment of speech and the second value for the feature comprises a value of zero.
  - 21. The method of claim 17 wherein the first value for the feature comprises a value of one and the second value for the feature comprises a value of zero.
  - 22. The method of claim 17 wherein the model comprises a hidden conditional random field model.
  - 23. The method of claim 17 wherein using both the first value for the feature and the second value for the feature in a model comprises determining a score for the first state using the first value and the second value.
  - 24. The method of claim 23 wherein the score for the first state comprises a sum taken over a set of states in a previous time point.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Acero, Alejandro, Mahajan, Milind, Gunawardana, Asela J.

Granted Patent

US 7,627,473 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/256.300
CPC Class Codes

G10L 15/14 using statistical models, e...

Hidden conditional random field models for phonetic classification and speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

15 Citations

24 Claims

Specification

Use Cases

Quick Links

Others

Hidden conditional random field models for phonetic classification and speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

24 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others