Discriminative language model pruning

US 9,292,487 B1
Filed: 08/16/2012
Issued: 03/22/2016
Est. Priority Date: 08/16/2012
Status: Active Grant

First Claim

Patent Images

1. A system for discriminatively pruning a language model, the system comprising:

an electronic data store configured to store a corpus of training texts; and

a computing device in communication with the electronic data store, the computing device configured to;

obtain a confusion matrix of confusable phonemes;

for a first text of the corpus of training texts, compute a first word lattice comprising the first text and an alternative hypothesis for the first text, wherein the first text comprises a first word;

for a second text of the corpus of training texts, compute a second word lattice comprising the second text and an alternative hypothesis for the second text using the confusion matrix, wherein the second text comprises a second word, and wherein the alternative hypothesis for the second text comprises the second text with the first word substituted for the second word;

obtain a language model comprising a plurality of trigrams;

for a first trigram of the plurality of trigrams, wherein the first trigram comprises the first word in a context of two other words, determine a plurality of values using the language model without pruning the first trigram, the plurality of values comprising;

a trigram probability for the first trigram;

a backoff probability for the first trigram, wherein the backoff probability is computed using a backoff weight and a bigram probability, and wherein the backoff probability corresponds to a probability used in the absence of the trigram probability;

a true path probability that a first true path of the first word lattice is correct, wherein the first true path comprises the first text; and

an error path probability that a first error path of the second word lattice is correct, wherein the first error path the alternative hypothesis for the second text;

compute a discriminative objective function value using the plurality of values, wherein the discriminative objective function value is based at least partly on a difference between (i) a first sum of values computed for individual true paths including the first true path, and (ii) a second sum of values computed for individual error paths including the first error path, wherein the value computed for the first true path is computed using the true path probability, the tri-gram probability and the backoff probability, and wherein the value computed for the first error path is computed using the error path probability, the tri-gram probability and the backoff probability;

based at least in part on the discriminative objective function value, prune the first trigram from the language model to generate a pruned language model;

receive, from a user computing device, an audio signal corresponding to speech of a user; and

recognize the speech, via a speech recognition server, using the pruned language model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A language model for speech recognition may be discriminatively pruned. In some embodiments, the language model is discriminatively pruned by computing a discriminative objective function value for one or more n-grams in the language model, and selecting one or more n-grams to prune based at least in part on a threshold value. In some embodiments, the language model is discriminatively pruned to a sufficiently small number of n-grams such that transcription of audio inputs may occur in real time, or such that the pruned language model may be stored on a device with relatively limited electronic storage capacity.

243 Citations

30 Claims

1. A system for discriminatively pruning a language model, the system comprising:
- an electronic data store configured to store a corpus of training texts; and
  
  a computing device in communication with the electronic data store, the computing device configured to;
  
  obtain a confusion matrix of confusable phonemes;
  
  for a first text of the corpus of training texts, compute a first word lattice comprising the first text and an alternative hypothesis for the first text, wherein the first text comprises a first word;
  
  for a second text of the corpus of training texts, compute a second word lattice comprising the second text and an alternative hypothesis for the second text using the confusion matrix, wherein the second text comprises a second word, and wherein the alternative hypothesis for the second text comprises the second text with the first word substituted for the second word;
  
  obtain a language model comprising a plurality of trigrams;
  
  for a first trigram of the plurality of trigrams, wherein the first trigram comprises the first word in a context of two other words, determine a plurality of values using the language model without pruning the first trigram, the plurality of values comprising;
  
  a trigram probability for the first trigram;
  
  a backoff probability for the first trigram, wherein the backoff probability is computed using a backoff weight and a bigram probability, and wherein the backoff probability corresponds to a probability used in the absence of the trigram probability;
  
  a true path probability that a first true path of the first word lattice is correct, wherein the first true path comprises the first text; and
  
  an error path probability that a first error path of the second word lattice is correct, wherein the first error path the alternative hypothesis for the second text;
  
  compute a discriminative objective function value using the plurality of values, wherein the discriminative objective function value is based at least partly on a difference between (i) a first sum of values computed for individual true paths including the first true path, and (ii) a second sum of values computed for individual error paths including the first error path, wherein the value computed for the first true path is computed using the true path probability, the tri-gram probability and the backoff probability, and wherein the value computed for the first error path is computed using the error path probability, the tri-gram probability and the backoff probability;
  
  based at least in part on the discriminative objective function value, prune the first trigram from the language model to generate a pruned language model;
  
  receive, from a user computing device, an audio signal corresponding to speech of a user; and
  
  recognize the speech, via a speech recognition server, using the pruned language model.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, wherein the computing device is further configured to prune the first trigram from the language model to generate the pruned language model by comparing the discriminative objective function value to a threshold.
  - 3. The system of claim 1, wherein the confusion matrix comprises a plurality of confusion probabilities, and wherein individual confusion probabilities of the plurality of confusion probabilities comprise a probability that a phoneme of the language is recognized as a different phoneme of the language.
  - 4. The system of claim 1, wherein the computing device is further configured to:
    - generate a phoneme sequence corresponding to the second word, the phoneme sequence comprising a second phoneme;
      
      select a first phoneme from the confusion matrix based at least on a probability that the second phoneme is recognized as the first phoneme being greater than a probability that the second phoneme is recognized as at least one other phoneme;
      
      substitute the first phoneme for the second phoneme in the phoneme sequence to generate an alternate phoneme sequence; and
      
      generate the first word from alternate phoneme sequence, wherein the erroneous hypothesis for the second text comprises the first word generated from the alternate phoneme sequence.

5. A computer-implemented method comprising:
- as implemented by one or more computing devices configured with specific computer-executable instructions,obtaining a confusion probability that a first phoneme is recognized as a second phoneme;
  
  obtaining a first text comprising a first word;
  
  obtaining a second text comprising a second word;
  
  generating, using the confusion probability, an erroneous hypothesis for the second text, wherein generating the erroneous hypothesis comprises substituting the first word for the second word in the second text;
  
  obtaining a language model comprising a plurality of n-grams;
  
  for a first n-gram of the plurality of n-grams, wherein the n-gram comprises the first word in a context of one or more other words, determining a plurality of values using the language model, wherein the language model comprises the first n-gram, the plurality of values comprising;
  
  an n-gram probability for the first n-gram;
  
  a backoff probability for the first n-gram;
  
  a first probability that a true hypothesis comprising the first text is correct; and
  
  a second probability that the erroneous hypothesis is correct;
  
  computing an objective function value using the plurality of values, wherein the objective function value is based at least partly on a difference between (i) a first sum of values computed for individual true hypotheses including the true hypothesis and (ii) a second sum of values computed for individual erroneous hypotheses including the erroneous hypothesis, wherein the value computed for the true hypothesis is computed using the first probability, the n-gram probability and the backoff probability, and wherein the value computed for the erroneous hypothesis is computed using the second probability, the n-gram probability, and the backoff probability;
  
  based at least in part on the objective function value, pruning the first n-gram from the language model to generate a pruned language model; and
  
  recognizing user speech using the pruned language model and an audio signal corresponding to speech of a user.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
- - 6. The computer-implemented method of claim 5, wherein:
    - the first probability is computed using the n-gram probability; and
      
      the second probability is computed using the backoff probability.
  - 7. The computer-implemented method of claim 5, wherein pruning the first n-gram from the language model to generate a pruned language model using the objective function value comprises comparing the objective function value to a threshold.
  - 8. The computer-implemented method of claim 5, wherein:
    - the n-gram is a trigram; and
      
      the backoff probability corresponds to a probability used in the absence of the n-tram probability and is computed using a backoff weight and a bigram probability.
  - 9. The computer-implemented method of claim 5, wherein the erroneous hypothesis is an entry in an n-best list of hypotheses.
  - 10. The computer-implemented method of claim 5, further comprising using the objective function value to evaluate a difference between use of the language model with the n-gram and use of the language model without the n-gram.
  - 11. The computer-implemented method of claim 5, further comprising:
    - generating a phoneme sequence corresponding to the second word, the phoneme sequence comprising the first phoneme;
      
      selecting the second phoneme from a plurality of phonemes based at least on the confusion probability being greater than an additional confusion probability that the first phoneme is recognized as an additional phoneme;
      
      substituting the second phoneme for the first phoneme in the phoneme sequence to generate an alternate phoneme sequence; and
      
      generating the first word from the alternate phoneme sequence, wherein the erroneous hypothesis comprises the first word generated from the alternate phoneme sequence.
  - 12. The computer-implemented method of claim 5, wherein the value computed for the true hypothesis is computed by multiplying the first probability by a difference of (i) a logarithm of the n-gram probability and (ii) a logarithm of the backoff probability.
  - 13. The computer-implemented method of claim 5, wherein the true hypothesis comprises a phoneme sequence, and wherein the erroneous hypothesis comprises a phoneme sequence that is acoustically confusable for the phoneme sequence of the true hypothesis.

14. A system comprising:
- an electronic data store configured to store a language model comprising a plurality of n-grams; and
  
  a computing device in communication with the electronic data store, the first computing device configured to;
  
  obtain a confusion probability that a first phoneme is recognized as a second phoneme;
  
  obtain a first text comprising a first word;
  
  obtain a second text comprising a second word;
  
  generate, using the confusion probability, an erroneous hypothesis for the second text, wherein generating the erroneous hypothesis comprises substituting the first word for the second word in the second text;
  
  for a first n-gram of the plurality of n-grams, wherein the n-gram comprises the first word in a context of one or more other words, compute a plurality of values using the language model, wherein the language model comprises the first n-gram, the plurality of values comprising;
  
  an n-gram probability for the first n-gram,a backoff probability for the first n-gram,a first probability that a true hypothesis comprising the first text is correct; and
  
  a second probability that the erroneous hypothesis is correct;
  
  compute an objective function value using the plurality of values, wherein the objective function value is based at least partly on a difference (i) a first sum of values computed for individual true hypotheses including the true hypothesis, and (ii) a second sum of values computed for individual erroneous hypotheses including the erroneous hypothesis, wherein the value computed for the true hypothesis is computed using the first probability, the n-gram probability and the backoff probability, and wherein the value computed for the erroneous hypothesis is computed using the second probability, the n-gram probability, and the backoff probability;
  
  based at least in part on the objective function value, prune the first n-gram from the language model to generate a pruned language model; and
  
  recognize user speech using the pruned language model and an audio signal corresponding to speech of a user.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22)
- - 15. The system of claim 14, wherein:
    - the first probability is computed using the n-gram probability; and
      
      the second probability is computed using the backoff probability.
  - 16. The system of claim 14, wherein:
    - the electronic data store is further configured to store a corpus of training text; and
      
      the computing device is further configured to generate, from the corpus of training text, the language model.
  - 17. The system of claim 15, wherein the corpus of training text comprises at least one of a plurality of electronic mail messages;
    - a plurality of text messages;
      
      a plurality of instant messages; and
      
      a plurality of text documents.
  - 18. The system of claim 14, wherein the first n-gram is only pruned from the language model to generate a pruned language model if the objective function value does not satisfy a threshold.
  - 19. The system of claim 14, wherein:
    - the n-gram is a trigram; and
      
      the backoff probability corresponds to a probability used in the absence of the n-tram probability and is computed using a backoff weight and a bigram probability.
  - 20. The system of claim 14, wherein the erroneous hypothesis is an entry in an n-best list of hypotheses.
  - 21. The system of claim 14, wherein the computing device is further configured to:
    - generate a phoneme sequence corresponding to the second word, the phoneme sequence comprising the first phoneme;
      
      select the second phoneme from a plurality of phonemes based at least on the confusion probability being greater than an additional confusion probability that the first phoneme is recognized as an additional phoneme;
      
      substitute the second phoneme for the first phoneme in the phoneme sequence to generate an alternate phoneme sequence; and
      
      generate the first word from the alternate phoneme sequence, wherein the erroneous hypothesis comprises the first word generated from the alternate phoneme sequence.
  - 22. The system of claim 14, wherein the value computed for the true hypothesis is computed by multiplying the first probability by a difference of (i) a logarithm of the n-gram probability and (ii) a logarithm of the backoff probability.

23. A non-transitory computer-readable medium having a computer-executable component, the computer-executable component being configured to:
- obtain a confusion probability that a first phoneme is recognized as a second phoneme;
  
  obtain a first text comprising a first word;
  
  obtain a second text comprising a second word;
  
  generate, using the confusion probability, an erroneous hypothesis for the second text, wherein generating the erroneous hypothesis comprises substituting the first word for the second word in the second text;
  
  for a first n-gram of a language model, wherein the first n-gram comprises the first word in a context of one or more other words, determine a plurality of values using the language model, wherein the language model comprises the first n-gram, the plurality of values comprising;
  
  an n-gram probability for the first n-gram;
  
  a backoff probability for the first n-gram;
  
  a first probability that a true hypothesis comprising the first text is correct; and
  
  a second probability that the erroneous hypothesis is correct;
  
  compute an objective function value using the plurality of values, wherein the objective function value is based at least partly on a difference (i) a first sum of values computed for individual true hypotheses including the true hypothesis, and (ii) a second sum of values computed for individual erroneous hypotheses including the erroneous hypothesis, wherein the value computed for the true hypothesis is computed using the first probability, the n-gram probability and the backoff probability, and wherein the value computed for the erroneous hypothesis is computed using the second probability, the n-gram probability, and the backoff probability;
  
  based at least in part on the objective function value, prune the first n-gram from the language model to generate a pruned language model; and
  
  recognize user speech using the pruned language model and an audio signal corresponding to speech of a user.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
- - 24. The non-transitory computer-readable medium of claim 23, wherein:
    - the first probability is computed using the n-gram probability; and
      
      the second probability is computed using the backoff probability.
  - 25. The non-transitory computer-readable medium of claim 23, wherein:
    - the n-gram is a trigram; and
      
      the backoff probability corresponds to a probability used in the absence of the n-tram probability and is computed using a backoff weight and a bigram probability.
  - 26. The non-transitory computer-readable medium of claim 23, wherein the first n-gram is only pruned from the language model if the objective function value does not satisfy a threshold.
  - 27. The non-transitory computer-readable medium of claim 23, wherein the computer-executable component is further configured to:
    - for a second n-gram of the language model, wherein the second n-gram corresponds to the second word in a context of other words, compute a second objective function value based at least in part on;
      
      an n-gram probability for the second n-gram;
      
      a backoff probability for the second n-gram;
      
      the second probability that the erroneous hypothesis is correct; and
      
      a third probability that a second true hypothesis comprising the second text is correct; and
      
      based at least in part on the second objective function value, prune the second n-gram from the language model.
  - 28. The non-transitory computer-readable medium of claim 23, wherein the computer-executable component is further configured to use the objective function value to evaluate a difference between use of the language model with the n-gram and use of the language model without the n-gram.
  - 29. The non-transitory computer-readable medium of claim 23, wherein the computer-executable component is further configured to:
    - generate a phoneme sequence corresponding to the second word, the phoneme sequence comprising the first phoneme;
      
      select the second phoneme from a plurality of phonemes based at least on the confusion probability being greater than an additional confusion probability that the first phoneme is recognized as an additional phoneme;
      
      substitute the second phoneme for the first phoneme in the phoneme sequence to generate an alternate phoneme sequence; and
      
      generate the first word from the alternate phoneme sequence, wherein the erroneous hypothesis comprises the first word generated from the alternate phoneme sequence.
  - 30. The non-transitory computer-readable medium of claim 23, wherein the value computed for the true hypothesis is computed by multiplying the first probability by a difference of (i) a logarithm of the n-gram probability and (ii) a logarithm of the backoff probability.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Weber, Frederick V.
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Shin, Seong-Ah A

Application Number

US13/587,799
Time in Patent Office

1,314 Days
Field of Search

704/2, 704/9, 704/257, 704/240, 704/255
US Class Current

1/1
CPC Class Codes

G06F 13/00   Interconnection of, or tran...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/063   Training

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/197   Probabilistic grammars, e.g...

G10L 2015/025   Phonemes, fenemes or fenone...

Y02D 10/00   Energy efficient computing,...

Discriminative language model pruning

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

243 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Discriminative language model pruning

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

243 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links