Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models
First Claim
1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w1, w2, . . . , wn), each n-gram having an associated count, method steps for classifying the n-grams into non-redundant classes, said method steps comprising:
- (a) comparing the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class;
(b) associating all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w2, w3, . . . ,wn);
(c) establishing a complement count for each said putative (n-1)-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and
(d) comparing said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is an n-gram language modeler which significantly reduces the memory storage requirement and convergence time for language modelling systems and methods. The present invention aligns each n-gram with one of "n" number of non-intersecting classes. A count is determined for each n-gram representing the number of times each n-gram occurred in the training data. The n-grams are separated into classes and complement counts are determined. Using these counts and complement counts factors are determined, one factor for each class, using an iterative scaling algorithm. The language model probability, i.e., the probability that a word occurs given the occurrence of the previous two words, is determined using these factors.
-
Citations
10 Claims
-
1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w1, w2, . . . , wn), each n-gram having an associated count, method steps for classifying the n-grams into non-redundant classes, said method steps comprising:
-
(a) comparing the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class; (b) associating all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w2, w3, . . . ,wn); (c) establishing a complement count for each said putative (n-1)-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and (d) comparing said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class. - View Dependent Claims (2, 3)
-
-
4. A computer program product, comprising:
-
a computer usable medium having computer readable program code means embodied in said medium for classifying, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w1, w2, . . . ,wn), each n-gram having an associated count, the n-grams into non-redundant classes, said computer readable program code means comprising; computer readable program code means for causing a computer to effect a comparison of the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class; computer readable program code means for causing a computer to effect an association of all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w2, w3, . . . ,wn); computer readable program code means for causing a computer to effect an establishment of a complement count for each said putative (n-1-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and computer readable program code means for causing a computer to effect a comparison of said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class.
-
-
5. A computer program product, comprising:
a computer usable medium having computer readable program code means embodied in said medium for determining a conditional probability of a predicted word given the previous (n-1) words, wherein an n-gram defines a series of "n" words, each n-gram having an associated count, and the history of an n-gram being represented by the initial n-1 words of the n-gram, said computer readable program code means comprising; computer readable first program code means for causing the computer to effect an examination of each word within each n-gram and classifying each n-gram into one of a plurality of non-redundant classes; computer readable second program code means for causing the computer to effect a determination of a factor for each of said plurality of non-redundant classes, said factor representing the relative strength of predicting said predicted word given the previous (n-1) words; and computer readable third program code means for causing the computer to effect a determination of said conditional probability of the occurrence of said predicted word given that a particular sequence of (n-1) previous words have occurred using said factors. - View Dependent Claims (6)
-
7. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w1, w2, . . . ,wn), each n-gram having an associated count, method steps for determining a conditional probability of a predicted word given the previous (n-1) words, said method steps comprising:
-
(1) examining each word within each n-gram and classifying each n-gram into one of a plurality of non-redundant classes; (2) determining a factor for each of said plurality of non-redundant classes, said factor representing the relative strength of predicting said predicted word given the previous (n-1) words; and (3) determining the conditional probability of the occurrence of said predicted word given that a particular sequence of (n-1) previous words have occurred using said factors. - View Dependent Claims (8, 9, 10)
-
Specification