Discriminative training for language modeling

US 7,680,659 B2
Filed: 06/01/2005
Issued: 03/16/2010
Est. Priority Date: 06/01/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

for each value of a feature weight in a set of discrete values for the feature weight;

for each of a set of phonetic sequences;

a processor using a baseline language model to identify a set of candidate word sequences from the phonetic sequence, wherein the baseline language model designates one of the candidate word sequences as a most likely word sequence and wherein the baseline language model provides a probability for each candidate word sequence;

for each candidate word sequence in the set of candidate word sequences;

determining a value for a feature from the candidate word sequence;

multiplying the value of the feature weight by the value for the feature to produce a result and summing the result with the probability for the candidate word sequence provided by the baseline language model to produce a score for the candidate word sequence;

selecting the candidate word sequence with the highest score;

comparing the candidate word sequence with the highest score to an actual word sequence to determine a sum of the number of words in the actual word sequence that are replaced with another word in a candidate word sequence, the number of words in the actual word sequence that are omitted in the candidate word sequence, and the number of words present in the candidate word sequence that are not present in the actual word sequence to produce an error value;

summing the error values for the phonetic sequences together to form a sample risk; and

selecting the value for the feature weight that provides the smallest sample risk as the feature weight value for a feature in a discriminative language model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of training language model parameters trains discriminative model parameters in the language model based on a performance measure having discrete values.

71 Citations

View as Search Results

14 Claims

1. A method comprising:
- for each value of a feature weight in a set of discrete values for the feature weight;
  
  for each of a set of phonetic sequences;
  
  a processor using a baseline language model to identify a set of candidate word sequences from the phonetic sequence, wherein the baseline language model designates one of the candidate word sequences as a most likely word sequence and wherein the baseline language model provides a probability for each candidate word sequence;
  
  for each candidate word sequence in the set of candidate word sequences;
  
  determining a value for a feature from the candidate word sequence;
  
  multiplying the value of the feature weight by the value for the feature to produce a result and summing the result with the probability for the candidate word sequence provided by the baseline language model to produce a score for the candidate word sequence;
  
  selecting the candidate word sequence with the highest score;
  
  comparing the candidate word sequence with the highest score to an actual word sequence to determine a sum of the number of words in the actual word sequence that are replaced with another word in a candidate word sequence, the number of words in the actual word sequence that are omitted in the candidate word sequence, and the number of words present in the candidate word sequence that are not present in the actual word sequence to produce an error value;
  
  summing the error values for the phonetic sequences together to form a sample risk; and
  
  selecting the value for the feature weight that provides the smallest sample risk as the feature weight value for a feature in a discriminative language model.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1 further comprising:
    - using sample risk to select a feature to include in the discriminative language model.
  - 3. The method of claim 2 wherein selecting a feature based on a sample risk comprises determining a difference between the sample risk and a sample risk determined using the baseline model alone.
  - 4. The method of claim 2 wherein selecting a feature further comprises determining a similarity between performance measures determined for the plurality of training samples for the feature and performance measures determined for the plurality of training samples using features that do not include the feature and that were previously selected for the language model.

5. A computer-readable storage medium storing computer-executable instructions that when executed by a processor cause the processor to perform steps comprising:
- selecting a feature having a feature function for inclusion in a discriminative language model comprising a sum of weighted feature functions;
  
  for each of a plurality of different possible values of a weight applied to the feature function for the selected feature in the sum of weighted feature functions in the discriminative language model;
  
  scoring each of a plurality of candidate word sequences to produce a score for each candidate word sequence, wherein each candidate word sequence score is computed through steps comprising determining a value for the feature function of the selected feature from the respective candidate word sequence, multiplying the value of the weight by the value of the feature function of the selected feature and adding the result to a probability of the respective candidate word sequence provided by a baseline language model;
  
  selecting a candidate word sequence of the plurality of word sequences with a best score of the scores for the candidate word sequences;
  
  using the respective selected word sequence to generate a performance measure that is associated with the value of the weight by comparing the selected word sequence to an actual word sequence to determine a value for a discrete error function; and
  
  using the performance measures associated with the values of the weight to select a value of the weight to store in the discriminative language model.
- View Dependent Claims (6, 7, 8, 9, 10, 11)
- - 6. The computer-readable storage medium of claim 5 wherein scoring each of the plurality of candidate word sequences further comprises using the values of weights and feature functions of features that have previously been included in the discriminative language model.
  - 7. The computer-readable storage medium of claim 5 further comprising repeating the steps of scoring each of a plurality of candidate word sequences, selecting a word sequence and generating a performance measure for each of a plurality of training samples.
  - 8. The computer-readable storage medium of claim 7 wherein using the performance measure to select a value for a weight comprises combining performance measures of each training sample for each value of the weight to form a sample risk for each value and using the sample risks to select a value.
  - 9. The computer-readable storage medium of claim 5 wherein selecting a feature comprises:
    - determining a plurality of sample risks from a plurality of candidate features, with one sample risk being associated with each candidate feature;
      
      determining a baseline sample risk from a baseline model; and
      
      using the plurality of sample risks and the baseline sample risk to select a feature from the plurality of candidate features.
  - 10. The computer-readable storage medium of claim 9 wherein using the plurality of sample risks and the baseline sample risk comprises generating a score for each feature, each score being based on the difference between the sample risk associated with the feature and the baseline sample risk.
  - 11. The computer-readable storage medium of claim 10 wherein generating a score further comprises determining a measure of the interference between the feature and features that have previously been inserted into the discriminative language model.

12. A method of selecting features for a discriminative language model, the method comprising:
- for each of a set of candidate features, determining a difference between a performance measure associated with a discriminative language model that uses the feature and a performance measure associated with a discriminative language model that does not use the feature, wherein the performance measure associated with the discriminative language model that uses the feature is based on a count of the number of words in an actual word sequence that are omitted in a candidate word sequence selected using the discriminative language model that uses the feature, wherein the discriminative language model is a linear discriminative function that provides a score for a candidate word sequence wherein the linear discriminative function comprises a weighted sum of feature function values that includes a feature function value for the feature and a separate probability for the candidate word sequence;
  
  using each difference to score each candidate feature; and
  
  selecting a candidate feature based on the scores.
- View Dependent Claims (13, 14)
- - 13. The method of claim 12 wherein using each difference to score each candidate feature further comprises:
    - determining an interference score for the feature, the interference score indicating the similarity in the performance of a discriminative language model that uses the feature and a discriminative language model that does not use the feature; and
      
      using the interference score and the difference to score the feature.
  - 14. The method of claim 12 wherein determining the performance measure for the discriminative model that uses the feature comprises:
    - determining a plurality of performance measures using different values for a model parameter associated with the feature;
      
      selecting a value for the model parameter based on the plurality of performance measures; and
      
      using the selected value of the model parameter when determining the performance measure for the discriminative model that uses the feature.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Suzuki, Hisami, Gao, Jianfeng
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
Kovacek; David

Application Number

US11/142,432
Publication Number

US 20060277033A1
Time in Patent Office

1,749 Days
Field of Search

704 1- 10, 704/236, 704/240, 704/243, 704/246, 704251-257, 704/231, 704/239, 704/255, 370/352, 706/8, 706/21, 706/45, 706/52, 715/513, 715/700
US Class Current

704/236
CPC Class Codes

G10L 15/063 Training

G10L 15/197 Probabilistic grammars, e.g...

Discriminative training for language modeling

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

71 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Discriminative training for language modeling

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

71 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links