Discriminative language model pruning
First Claim
Patent Images
1. A system for discriminatively pruning a language model, the system comprising:
- an electronic data store configured to store a corpus of training texts; and
a computing device in communication with the electronic data store, the computing device configured to;
obtain a confusion matrix of confusable phonemes;
for a first text of the corpus of training texts, compute a first word lattice comprising the first text and an alternative hypothesis for the first text, wherein the first text comprises a first word;
for a second text of the corpus of training texts, compute a second word lattice comprising the second text and an alternative hypothesis for the second text using the confusion matrix, wherein the second text comprises a second word, and wherein the alternative hypothesis for the second text comprises the second text with the first word substituted for the second word;
obtain a language model comprising a plurality of trigrams;
for a first trigram of the plurality of trigrams, wherein the first trigram comprises the first word in a context of two other words, determine a plurality of values using the language model without pruning the first trigram, the plurality of values comprising;
a trigram probability for the first trigram;
a backoff probability for the first trigram, wherein the backoff probability is computed using a backoff weight and a bigram probability, and wherein the backoff probability corresponds to a probability used in the absence of the trigram probability;
a true path probability that a first true path of the first word lattice is correct, wherein the first true path comprises the first text; and
an error path probability that a first error path of the second word lattice is correct, wherein the first error path the alternative hypothesis for the second text;
compute a discriminative objective function value using the plurality of values, wherein the discriminative objective function value is based at least partly on a difference between (i) a first sum of values computed for individual true paths including the first true path, and (ii) a second sum of values computed for individual error paths including the first error path, wherein the value computed for the first true path is computed using the true path probability, the tri-gram probability and the backoff probability, and wherein the value computed for the first error path is computed using the error path probability, the tri-gram probability and the backoff probability;
based at least in part on the discriminative objective function value, prune the first trigram from the language model to generate a pruned language model;
receive, from a user computing device, an audio signal corresponding to speech of a user; and
recognize the speech, via a speech recognition server, using the pruned language model.
1 Assignment
0 Petitions
Accused Products
Abstract
A language model for speech recognition may be discriminatively pruned. In some embodiments, the language model is discriminatively pruned by computing a discriminative objective function value for one or more n-grams in the language model, and selecting one or more n-grams to prune based at least in part on a threshold value. In some embodiments, the language model is discriminatively pruned to a sufficiently small number of n-grams such that transcription of audio inputs may occur in real time, or such that the pruned language model may be stored on a device with relatively limited electronic storage capacity.
243 Citations
30 Claims
-
1. A system for discriminatively pruning a language model, the system comprising:
-
an electronic data store configured to store a corpus of training texts; and a computing device in communication with the electronic data store, the computing device configured to; obtain a confusion matrix of confusable phonemes; for a first text of the corpus of training texts, compute a first word lattice comprising the first text and an alternative hypothesis for the first text, wherein the first text comprises a first word; for a second text of the corpus of training texts, compute a second word lattice comprising the second text and an alternative hypothesis for the second text using the confusion matrix, wherein the second text comprises a second word, and wherein the alternative hypothesis for the second text comprises the second text with the first word substituted for the second word; obtain a language model comprising a plurality of trigrams; for a first trigram of the plurality of trigrams, wherein the first trigram comprises the first word in a context of two other words, determine a plurality of values using the language model without pruning the first trigram, the plurality of values comprising; a trigram probability for the first trigram; a backoff probability for the first trigram, wherein the backoff probability is computed using a backoff weight and a bigram probability, and wherein the backoff probability corresponds to a probability used in the absence of the trigram probability; a true path probability that a first true path of the first word lattice is correct, wherein the first true path comprises the first text; and an error path probability that a first error path of the second word lattice is correct, wherein the first error path the alternative hypothesis for the second text; compute a discriminative objective function value using the plurality of values, wherein the discriminative objective function value is based at least partly on a difference between (i) a first sum of values computed for individual true paths including the first true path, and (ii) a second sum of values computed for individual error paths including the first error path, wherein the value computed for the first true path is computed using the true path probability, the tri-gram probability and the backoff probability, and wherein the value computed for the first error path is computed using the error path probability, the tri-gram probability and the backoff probability; based at least in part on the discriminative objective function value, prune the first trigram from the language model to generate a pruned language model; receive, from a user computing device, an audio signal corresponding to speech of a user; and recognize the speech, via a speech recognition server, using the pruned language model. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method comprising:
as implemented by one or more computing devices configured with specific computer-executable instructions, obtaining a confusion probability that a first phoneme is recognized as a second phoneme; obtaining a first text comprising a first word; obtaining a second text comprising a second word; generating, using the confusion probability, an erroneous hypothesis for the second text, wherein generating the erroneous hypothesis comprises substituting the first word for the second word in the second text; obtaining a language model comprising a plurality of n-grams; for a first n-gram of the plurality of n-grams, wherein the n-gram comprises the first word in a context of one or more other words, determining a plurality of values using the language model, wherein the language model comprises the first n-gram, the plurality of values comprising; an n-gram probability for the first n-gram; a backoff probability for the first n-gram; a first probability that a true hypothesis comprising the first text is correct; and a second probability that the erroneous hypothesis is correct; computing an objective function value using the plurality of values, wherein the objective function value is based at least partly on a difference between (i) a first sum of values computed for individual true hypotheses including the true hypothesis and (ii) a second sum of values computed for individual erroneous hypotheses including the erroneous hypothesis, wherein the value computed for the true hypothesis is computed using the first probability, the n-gram probability and the backoff probability, and wherein the value computed for the erroneous hypothesis is computed using the second probability, the n-gram probability, and the backoff probability; based at least in part on the objective function value, pruning the first n-gram from the language model to generate a pruned language model; and recognizing user speech using the pruned language model and an audio signal corresponding to speech of a user. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
-
14. A system comprising:
-
an electronic data store configured to store a language model comprising a plurality of n-grams; and a computing device in communication with the electronic data store, the first computing device configured to; obtain a confusion probability that a first phoneme is recognized as a second phoneme; obtain a first text comprising a first word; obtain a second text comprising a second word; generate, using the confusion probability, an erroneous hypothesis for the second text, wherein generating the erroneous hypothesis comprises substituting the first word for the second word in the second text; for a first n-gram of the plurality of n-grams, wherein the n-gram comprises the first word in a context of one or more other words, compute a plurality of values using the language model, wherein the language model comprises the first n-gram, the plurality of values comprising; an n-gram probability for the first n-gram, a backoff probability for the first n-gram, a first probability that a true hypothesis comprising the first text is correct; and a second probability that the erroneous hypothesis is correct; compute an objective function value using the plurality of values, wherein the objective function value is based at least partly on a difference (i) a first sum of values computed for individual true hypotheses including the true hypothesis, and (ii) a second sum of values computed for individual erroneous hypotheses including the erroneous hypothesis, wherein the value computed for the true hypothesis is computed using the first probability, the n-gram probability and the backoff probability, and wherein the value computed for the erroneous hypothesis is computed using the second probability, the n-gram probability, and the backoff probability; based at least in part on the objective function value, prune the first n-gram from the language model to generate a pruned language model; and recognize user speech using the pruned language model and an audio signal corresponding to speech of a user. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A non-transitory computer-readable medium having a computer-executable component, the computer-executable component being configured to:
-
obtain a confusion probability that a first phoneme is recognized as a second phoneme; obtain a first text comprising a first word; obtain a second text comprising a second word; generate, using the confusion probability, an erroneous hypothesis for the second text, wherein generating the erroneous hypothesis comprises substituting the first word for the second word in the second text; for a first n-gram of a language model, wherein the first n-gram comprises the first word in a context of one or more other words, determine a plurality of values using the language model, wherein the language model comprises the first n-gram, the plurality of values comprising; an n-gram probability for the first n-gram; a backoff probability for the first n-gram; a first probability that a true hypothesis comprising the first text is correct; and a second probability that the erroneous hypothesis is correct; compute an objective function value using the plurality of values, wherein the objective function value is based at least partly on a difference (i) a first sum of values computed for individual true hypotheses including the true hypothesis, and (ii) a second sum of values computed for individual erroneous hypotheses including the erroneous hypothesis, wherein the value computed for the true hypothesis is computed using the first probability, the n-gram probability and the backoff probability, and wherein the value computed for the erroneous hypothesis is computed using the second probability, the n-gram probability, and the backoff probability; based at least in part on the objective function value, prune the first n-gram from the language model to generate a pruned language model; and recognize user speech using the pruned language model and an audio signal corresponding to speech of a user. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
-
Specification