Discriminative training for language modeling
First Claim
Patent Images
1. A method comprising:
- for each value of a feature weight in a set of discrete values for the feature weight;
for each of a set of phonetic sequences;
a processor using a baseline language model to identify a set of candidate word sequences from the phonetic sequence, wherein the baseline language model designates one of the candidate word sequences as a most likely word sequence and wherein the baseline language model provides a probability for each candidate word sequence;
for each candidate word sequence in the set of candidate word sequences;
determining a value for a feature from the candidate word sequence;
multiplying the value of the feature weight by the value for the feature to produce a result and summing the result with the probability for the candidate word sequence provided by the baseline language model to produce a score for the candidate word sequence;
selecting the candidate word sequence with the highest score;
comparing the candidate word sequence with the highest score to an actual word sequence to determine a sum of the number of words in the actual word sequence that are replaced with another word in a candidate word sequence, the number of words in the actual word sequence that are omitted in the candidate word sequence, and the number of words present in the candidate word sequence that are not present in the actual word sequence to produce an error value;
summing the error values for the phonetic sequences together to form a sample risk; and
selecting the value for the feature weight that provides the smallest sample risk as the feature weight value for a feature in a discriminative language model.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of training language model parameters trains discriminative model parameters in the language model based on a performance measure having discrete values.
71 Citations
14 Claims
-
1. A method comprising:
-
for each value of a feature weight in a set of discrete values for the feature weight;
for each of a set of phonetic sequences;a processor using a baseline language model to identify a set of candidate word sequences from the phonetic sequence, wherein the baseline language model designates one of the candidate word sequences as a most likely word sequence and wherein the baseline language model provides a probability for each candidate word sequence; for each candidate word sequence in the set of candidate word sequences; determining a value for a feature from the candidate word sequence; multiplying the value of the feature weight by the value for the feature to produce a result and summing the result with the probability for the candidate word sequence provided by the baseline language model to produce a score for the candidate word sequence; selecting the candidate word sequence with the highest score; comparing the candidate word sequence with the highest score to an actual word sequence to determine a sum of the number of words in the actual word sequence that are replaced with another word in a candidate word sequence, the number of words in the actual word sequence that are omitted in the candidate word sequence, and the number of words present in the candidate word sequence that are not present in the actual word sequence to produce an error value; summing the error values for the phonetic sequences together to form a sample risk; and selecting the value for the feature weight that provides the smallest sample risk as the feature weight value for a feature in a discriminative language model. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-readable storage medium storing computer-executable instructions that when executed by a processor cause the processor to perform steps comprising:
-
selecting a feature having a feature function for inclusion in a discriminative language model comprising a sum of weighted feature functions; for each of a plurality of different possible values of a weight applied to the feature function for the selected feature in the sum of weighted feature functions in the discriminative language model; scoring each of a plurality of candidate word sequences to produce a score for each candidate word sequence, wherein each candidate word sequence score is computed through steps comprising determining a value for the feature function of the selected feature from the respective candidate word sequence, multiplying the value of the weight by the value of the feature function of the selected feature and adding the result to a probability of the respective candidate word sequence provided by a baseline language model; selecting a candidate word sequence of the plurality of word sequences with a best score of the scores for the candidate word sequences; using the respective selected word sequence to generate a performance measure that is associated with the value of the weight by comparing the selected word sequence to an actual word sequence to determine a value for a discrete error function; and using the performance measures associated with the values of the weight to select a value of the weight to store in the discriminative language model. - View Dependent Claims (6, 7, 8, 9, 10, 11)
-
-
12. A method of selecting features for a discriminative language model, the method comprising:
-
for each of a set of candidate features, determining a difference between a performance measure associated with a discriminative language model that uses the feature and a performance measure associated with a discriminative language model that does not use the feature, wherein the performance measure associated with the discriminative language model that uses the feature is based on a count of the number of words in an actual word sequence that are omitted in a candidate word sequence selected using the discriminative language model that uses the feature, wherein the discriminative language model is a linear discriminative function that provides a score for a candidate word sequence wherein the linear discriminative function comprises a weighted sum of feature function values that includes a feature function value for the feature and a separate probability for the candidate word sequence; using each difference to score each candidate feature; and selecting a candidate feature based on the scores. - View Dependent Claims (13, 14)
-
Specification