Discriminative training of language models for text and speech classification
First Claim
Patent Images
1. A computer implemented method of classifying a natural language input, comprising:
- training a plurality of statistical classification components jointly in relation to one another to maximize a conditional likelihood of a class given a word string using an application of the rational growth transform, the plurality of statistical classification components being n-gram language model classifiers that each correspond to a different class, wherein each class corresponds to a different category of subject matter, and wherein training the plurality of statistical classification components comprises;
identifying an optimal number of rational function growth transform iterations and an optimal conditional maximum likelihood (CML) weight β
max to facilitate application of the rational function growth transform, wherein identifying comprises;
splitting a collection of training data into a collection of main data and a collection of held-out data;
using the main data to estimate a series of relative frequencies for the statistical classification components; and
using the held-out data to tune the optimal number of rational function growth transform iterations and the optimal CML weight β
max;
receiving a natural language input;
applying the plurality of statistical classification components to the natural language input so as to classify the natural language input into a particular one or more of the plurality of classes that represent the category or categories of subject matter that is best correlated to the natural language input;
wherein using the held-out data to tune comprises;
fixing a preset number N of rational function growth transform iterations to be run;
fixing a range of values to be explored for determining the optimal CML weight β
max;
for each value β
max, running as many rational function growth transform functions as possible up to N such that the conditional likelihood of the main data increases at each iteration; and
identifying as optimal the number of rational function growth transforms iterations; and
the β
max value that maximizes the conditional likelihood of the held-out data; and
wherein training the plurality of statistical classification components further comprises;
pooling the main and held-out data to form a combined collection of training data; and
training the plurality of statistical classification components on the combined collection of training data using the optimal number of rational function growth transform iterations and the optimal CML weight β
max.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.
35 Citations
13 Claims
-
1. A computer implemented method of classifying a natural language input, comprising:
-
training a plurality of statistical classification components jointly in relation to one another to maximize a conditional likelihood of a class given a word string using an application of the rational growth transform, the plurality of statistical classification components being n-gram language model classifiers that each correspond to a different class, wherein each class corresponds to a different category of subject matter, and wherein training the plurality of statistical classification components comprises; identifying an optimal number of rational function growth transform iterations and an optimal conditional maximum likelihood (CML) weight β
max to facilitate application of the rational function growth transform, wherein identifying comprises;splitting a collection of training data into a collection of main data and a collection of held-out data; using the main data to estimate a series of relative frequencies for the statistical classification components; and using the held-out data to tune the optimal number of rational function growth transform iterations and the optimal CML weight β
max;receiving a natural language input; applying the plurality of statistical classification components to the natural language input so as to classify the natural language input into a particular one or more of the plurality of classes that represent the category or categories of subject matter that is best correlated to the natural language input; wherein using the held-out data to tune comprises; fixing a preset number N of rational function growth transform iterations to be run; fixing a range of values to be explored for determining the optimal CML weight β
max;for each value β
max, running as many rational function growth transform functions as possible up to N such that the conditional likelihood of the main data increases at each iteration; andidentifying as optimal the number of rational function growth transforms iterations; and
the β
max value that maximizes the conditional likelihood of the held-out data; andwherein training the plurality of statistical classification components further comprises; pooling the main and held-out data to form a combined collection of training data; and training the plurality of statistical classification components on the combined collection of training data using the optimal number of rational function growth transform iterations and the optimal CML weight β
max. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
Specification