N-Gram Selection for Practical-Sized Language Models
First Claim
1. In a computing environment, a method performed on at least one processor, comprising, processing training data to train an N-gram model, including excluding a higher-order probability estimate for an N-gram in the model when a backoff probability estimate for the N-gram is within a maximum likelihood set determined by that N-gram and the N-gram'"'"'s associated context.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is a technology by which a statistical N-gram (e.g., language) model is trained using an N-gram selection technique that helps reduce the size of the final N-gram model. During training, a higher-order probability estimate for an N-gram is only added to the model when the training data justifies adding the estimate. To this end, if a backoff probability estimate is within a maximum likelihood set determined by that N-gram and the N-gram'"'"'s associated context, or is between the higher-order estimate and the maximum likelihood set, then the higher-order estimate is not included in the model. The backoff probability estimate may be determined via an iterative process such that the backoff probability estimate is based on the final model rather than any lower-order model. Also described is additional pruning referred to as modified weighted difference pruning.
-
Citations
20 Claims
- 1. In a computing environment, a method performed on at least one processor, comprising, processing training data to train an N-gram model, including excluding a higher-order probability estimate for an N-gram in the model when a backoff probability estimate for the N-gram is within a maximum likelihood set determined by that N-gram and the N-gram'"'"'s associated context.
- 14. In a computing environment, a system comprising, a training mechanism that trains an N-gram language model by processing training data into lower-order models relative to the N-gram language model, including using each lower-order model to train a next-higher order model until the N-gram language model is trained, the training mechanism including an N-gram selection process that evaluates, for each N-gram, whether a backoff probability estimate is within a maximum likelihood set determined by that N-gram and the N-gram'"'"'s associated context, and if so, to exclude the N-gram from the language model.
- 17. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising, determining whether to include a higher-order estimated probability in a model for a given N-gram, including computing a final backoff weight, obtaining a difference value corresponding to a difference between a first value representative of a lower-order estimated probability minus a second value representative of the higher-order estimated probability times the final backoff weight, comparing the difference value against a pruning threshold, and including the higher-order estimated probability or pruning higher-order estimated probability based on whether the threshold is met.
Specification