N-Gram Selection for Practical-Sized Language Models

US 20110224971A1
Filed: 03/11/2010
Published: 09/15/2011
Est. Priority Date: 03/11/2010
Status: Active Grant

First Claim

Patent Images

1. In a computing environment, a method performed on at least one processor, comprising, processing training data to train an N-gram model, including excluding a higher-order probability estimate for an N-gram in the model when a backoff probability estimate for the N-gram is within a maximum likelihood set determined by that N-gram and the N-gram'"'"'s associated context.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is a technology by which a statistical N-gram (e.g., language) model is trained using an N-gram selection technique that helps reduce the size of the final N-gram model. During training, a higher-order probability estimate for an N-gram is only added to the model when the training data justifies adding the estimate. To this end, if a backoff probability estimate is within a maximum likelihood set determined by that N-gram and the N-gram'"'"'s associated context, or is between the higher-order estimate and the maximum likelihood set, then the higher-order estimate is not included in the model. The backoff probability estimate may be determined via an iterative process such that the backoff probability estimate is based on the final model rather than any lower-order model. Also described is additional pruning referred to as modified weighted difference pruning.

Citations

20 Claims

1. In a computing environment, a method performed on at least one processor, comprising, processing training data to train an N-gram model, including excluding a higher-order probability estimate for an N-gram in the model when a backoff probability estimate for the N-gram is within a maximum likelihood set determined by that N-gram and the N-gram'"'"'s associated context.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1 further comprising, determining the backoff probability estimate by iteratively adjusting a backoff weight until a probability mass is within a threshold of a desired value, and using the estimated backoff weight in computing the backoff probability estimate.
  - 3. The method of claim 2 further comprising estimating an initial backoff weight based upon including a higher-order probability estimate for observed N-grams in the training data.
  - 4. The method of claim 2 further comprising, performing N-gram selection to obtain probability values for summing into the probability mass.
  - 5. The method of claim 2 wherein adjusting the backoff weight comprises selecting a new backoff weight by a false position method, or selecting a new backoff weight by a bisection method, or selecting a new backoff weight by a false position method and by a bisection method.
  - 6. The method of claim 5 further comprising, performing the false position method to make adjustments to select new backoff weights over a plurality of iterations, tracking consecutive iterations that make an adjustment in a same direction, and if the consecutive iterations that make an adjustment in the same direction exceeds a count, performing the bisection method to select a new backoff weight for the next iteration.
  - 7. The method of claim 1 wherein the backoff probability estimate is not within the maximum likelihood set, and further comprising, determining whether the backoff probability estimate is between the maximum likelihood set and the higher-order probability estimate, and if so, excluding the higher-order probability estimate from the model.
  - 8. The method of claim 1 wherein the backoff probability estimate is not within the maximum likelihood set, and further comprising, determining whether the N-gram has been observed in the training data, and if so, including the higher-order probability estimate in the model.
  - 9. The method of claim 1 further comprising, determining whether the N-gram has been observed in the training data, and if not, determining whether the backoff probability estimate exceeds a limit corresponding to the maximum likelihood set, and if the limit is exceeded, including a capped higher-order probability estimate, based upon the limit, in the model.
  - 10. The method of claim 9 further comprising, determining the backoff probability estimate by iteratively adjusting a backoff weight to increase a probability mass, and increasing the limit if the probability mass cannot be increased by adjusting the backoff weight.
  - 11. The method of claim 1 wherein processing the training data to train the N-gram model further comprises performing additional pruning.
  - 12. The method of claim 11 wherein performing the additional pruning comprises determining whether to include a higher-order estimated probability in a model for a given N-gram, including computing a backoff weight, obtaining a difference value corresponding to a difference between a first value representative of a lower-order estimated probability minus a second value representative of the higher-order estimated probability times the backoff weight, comparing the difference value against a pruning threshold, and including the higher-order estimated probability or pruning higher-order estimated probability based on whether the threshold is met
  - 13. The method of claim 11 wherein performing the additional pruning comprises performing modified weighted difference pruning, including using an estimate of the probability of an N-gram computed by chaining explicit probability estimates for N-gram lengths, using an absolute value of a difference of the log probabilities, or computing the difference in log probability with respect to a final backoff weight for the model, or any combination of using an estimate of the probability of an N-gram computed by chaining explicit probability estimates for N-gram lengths, using an absolute value of a difference of the log probabilities, or computing the difference in log probability with respect to a final backoff weight for the model.

14. In a computing environment, a system comprising, a training mechanism that trains an N-gram language model by processing training data into lower-order models relative to the N-gram language model, including using each lower-order model to train a next-higher order model until the N-gram language model is trained, the training mechanism including an N-gram selection process that evaluates, for each N-gram, whether a backoff probability estimate is within a maximum likelihood set determined by that N-gram and the N-gram'"'"'s associated context, and if so, to exclude the N-gram from the language model.
- View Dependent Claims (15, 16)
- - 15. The system of claim 14 further comprising a modified weighted difference pruning mechanism that excludes N-grams from the language model based upon an estimate of the probability of an N-gram computed by chaining explicit probability estimates for N-gram lengths, an absolute value of a difference of the log probabilities, or a difference in log probability with respect to a final backoff weight for the model, or any combination of an estimate of the probability of an N-gram computed by chaining explicit probability estimates for N-gram lengths, an absolute value of a difference of the log probabilities, or the difference in log probability with respect to a final backoff weight for the model.
  - 16. The system of claim 14 further comprising a mechanism that obtains the backoff probability estimate by iterating to adjust a backoff weight until a probability mass corresponding to that backoff weight is within a threshold of a desired value.

17. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising, determining whether to include a higher-order estimated probability in a model for a given N-gram, including computing a final backoff weight, obtaining a difference value corresponding to a difference between a first value representative of a lower-order estimated probability minus a second value representative of the higher-order estimated probability times the final backoff weight, comparing the difference value against a pruning threshold, and including the higher-order estimated probability or pruning higher-order estimated probability based on whether the threshold is met.
- View Dependent Claims (18, 19, 20)
- - 18. The one or more computer-readable media of claim 17 wherein the difference value is an absolute value of the difference.
  - 19. The one or more computer-readable media of claim 17 wherein the first value is a log of the lower-order estimated probability and the second value is a log of the higher-order estimated probability times the final backoff weight.
  - 20. The one or more computer-readable media of claim 17 wherein obtaining the difference value comprises multiplying the difference by a constant computed by chaining probability estimates for N-grams lengths.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Moore, Robert Carter

Granted Patent

US 8,655,647 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/211 Syntactic parsing, e.g. bas...

N-Gram Selection for Practical-Sized Language Models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

N-Gram Selection for Practical-Sized Language Models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links