Method and apparatus for fast machine training
First Claim
1. A method of training model parameters that describe a large number of objects wherein the time required to train the model parameters grows with the number of outputs of the system, the method comprising:
- assigning objects from the large set of objects to classes;
training a first set of model parameters that describe the likelihood of each of the classes given an input, the training comprising determining a model parameter for each context in a corpus of objects from the large set of objects by generating a term for each class and combining the terms to form the model parameter; and
training a second set of model parameters that describe the likelihood of each of the objects based in part on the classes, the first set of model parameters and the second set of model parameters together describing the likelihood of an object given an input.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are provided that reduce the training time associated with machine learning systems whose training time is proportional to the number of outputs being trained. Under embodiments of the invention, the number of outputs to be trained is reduced by dividing the objects to be modeled into classes. This produces at least two sets of model parameters. At least one set describes some aspect of the classes given some context, and at least one other set of parameters describes some aspect of the objects given a class and the context. Thus, instead of training a system with a large number of outputs, corresponding to all of the objects, the present invention trains at least two models, each of which has a much smaller number of outputs.
64 Citations
17 Claims
-
1. A method of training model parameters that describe a large number of objects wherein the time required to train the model parameters grows with the number of outputs of the system, the method comprising:
-
assigning objects from the large set of objects to classes;
training a first set of model parameters that describe the likelihood of each of the classes given an input, the training comprising determining a model parameter for each context in a corpus of objects from the large set of objects by generating a term for each class and combining the terms to form the model parameter; and
training a second set of model parameters that describe the likelihood of each of the objects based in part on the classes, the first set of model parameters and the second set of model parameters together describing the likelihood of an object given an input. - View Dependent Claims (2, 3, 4, 5, 6, 7)
generating at least one additional level of classing, each level of classing having classes;
training a highest set of model parameters for a highest level of classing, the highest set of model parameters describing the classes of the highest level of classing; and
training a respective lower set of model parameters at each level of classing other than the highest level of classing, each lower set of model parameters capable of describing the classes for the respective level of classing based in part on a class from a higher level of classing.
-
-
8. A computer-readable medium having computer-executable components for training parameters used in modeling a set of objects, the computer-executable components comprising:
-
a class defining component capable of designating a class for each of the objects;
a class parameter training component capable of determining parameters that are used to identify an attribute of a class in the context of at least a set of input values, the class parameter training component determining a parameter for each context in a corpus of the objects by generating a separate term for each possible class and combining the terms to form the parameter for the context in the corpus; and
an object parameter training component capable of determining parameters that are used to identify a probability of an object in the context of a set of input values and a class, the object parameter training component executing in a period of time that is proportional to the number of objects in a class. - View Dependent Claims (9, 10, 11, 12, 13)
a multi-level classing component capable of designating multiple class levels;
a plurality of level parameter training components, each level parameter training component capable of determining parameters that are used to identify an attribute of a class in a corresponding class level in the context of a set of input values and at least one class of at least one different level; and
a highest level parameter training component, capable of determining paramters that are used to identify an attribute of a class at a highest class level in the context of a set of input values.
-
-
14. A method of reducing training time for a full minimum divergence model capable of providing the probability of a next word given a sequence of preceding words, the method comprising:
-
dividing words into classes;
training a class minimum divergence model capable of providing the probability of a class given at least a sequence of preceding words, the training comprising training a separate model parameter for each context in a training corpus by generating a term for each possible class and combining the terms to form the model parameter for the context;
training a word minimum divergence model capable of providing the probability of a next word given a class and a sequence of preceding words; and
using the class minimum divergence model and the word minimum divergence model together to represent the full minimum divergence model. - View Dependent Claims (15, 16, 17)
dividing each class into super classes;
training a super class minimum divergence model capable of providing the probability of a super, class given at least a preceding sequence of words;
training a special class minimum divergence model capable of providing the probability of a class given a super class and a preceding sequence of words; and
using the super class minimum divergence model and the special class minimum divergence model to represent the class minimum divergence model.
-
-
16. The method of claim 15 wherein training a super class minimum divergence model comprises:
-
generating at least one additional level of classing, each level of classing having classes;
training a minimum divergence model at each level of classing except the highest level of classing, each model being capable of providing the probability of a class given at least a class from a higher level of classing; and
training a minimum divergence model for the highest level of classing, the model being capable of providing the probability of a class given a preceding sequence of words.
-
-
17. The method of claim 14 wherein the full minimum divergence model, the class minimum divergence model, and the word minimum divergence model are each maximum entropy models.
Specification