×

Discriminative training for language modeling

  • US 7,680,659 B2
  • Filed: 06/01/2005
  • Issued: 03/16/2010
  • Est. Priority Date: 06/01/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • for each value of a feature weight in a set of discrete values for the feature weight;

    for each of a set of phonetic sequences;

    a processor using a baseline language model to identify a set of candidate word sequences from the phonetic sequence, wherein the baseline language model designates one of the candidate word sequences as a most likely word sequence and wherein the baseline language model provides a probability for each candidate word sequence;

    for each candidate word sequence in the set of candidate word sequences;

    determining a value for a feature from the candidate word sequence;

    multiplying the value of the feature weight by the value for the feature to produce a result and summing the result with the probability for the candidate word sequence provided by the baseline language model to produce a score for the candidate word sequence;

    selecting the candidate word sequence with the highest score;

    comparing the candidate word sequence with the highest score to an actual word sequence to determine a sum of the number of words in the actual word sequence that are replaced with another word in a candidate word sequence, the number of words in the actual word sequence that are omitted in the candidate word sequence, and the number of words present in the candidate word sequence that are not present in the actual word sequence to produce an error value;

    summing the error values for the phonetic sequences together to form a sample risk; and

    selecting the value for the feature weight that provides the smallest sample risk as the feature weight value for a feature in a discriminative language model.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×