Discriminative training of language models for text and speech classification
First Claim
1. A computer-implemented method for estimating a set of parameters for each of a plurality of language models that correspond to a plurality of classes, the method comprising:
- setting initial values for the sets of parameters; and
adjusting the sets of parameters jointly in relation to one another to increase a conditional likelihood of a class given a word string.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.
80 Citations
25 Claims
-
1. A computer-implemented method for estimating a set of parameters for each of a plurality of language models that correspond to a plurality of classes, the method comprising:
-
setting initial values for the sets of parameters; and
adjusting the sets of parameters jointly in relation to one another to increase a conditional likelihood of a class given a word string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method of estimating a set of parameters for each of a plurality of n-gram language models, the method comprising:
producing at least two of the sets of parameters jointly in relation to one another. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
18. A computer implemented method of classifying a natural language input, comprising:
-
training a plurality of statistical classification components, which correspond to a plurality of classes, jointly in relation to one another to increase a conditional likelihood of a class given a word string, the plurality of statistical classification components being n-gram language model classifiers;
receiving a natural language input;
applying the plurality of statistical classification components to the natural language input to classify the natural language input into one of the plurality of classes. - View Dependent Claims (19, 20, 21, 22, 23)
-
-
24. The method of 23, wherein training the plurality of statistical classification components further comprises:
-
pooling the main and held-out data to form a combined collection of training data; and
training the plurality of statistical classification components on the combined collection of training data using the optimal number of rational function growth transform iterations and the optimal CML weight β
max. - View Dependent Claims (25)
-
Specification