Method and apparatus for fast machine training

US 6,697,769 B1
Filed: 01/21/2000
Issued: 02/24/2004
Est. Priority Date: 01/21/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method of training model parameters that describe a large number of objects wherein the time required to train the model parameters grows with the number of outputs of the system, the method comprising:

assigning objects from the large set of objects to classes;

training a first set of model parameters that describe the likelihood of each of the classes given an input, the training comprising determining a model parameter for each context in a corpus of objects from the large set of objects by generating a term for each class and combining the terms to form the model parameter; and

training a second set of model parameters that describe the likelihood of each of the objects based in part on the classes, the first set of model parameters and the second set of model parameters together describing the likelihood of an object given an input.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are provided that reduce the training time associated with machine learning systems whose training time is proportional to the number of outputs being trained. Under embodiments of the invention, the number of outputs to be trained is reduced by dividing the objects to be modeled into classes. This produces at least two sets of model parameters. At least one set describes some aspect of the classes given some context, and at least one other set of parameters describes some aspect of the objects given a class and the context. Thus, instead of training a system with a large number of outputs, corresponding to all of the objects, the present invention trains at least two models, each of which has a much smaller number of outputs.

64 Citations

View as Search Results

17 Claims

1. A method of training model parameters that describe a large number of objects wherein the time required to train the model parameters grows with the number of outputs of the system, the method comprising:
- assigning objects from the large set of objects to classes;
  
  training a first set of model parameters that describe the likelihood of each of the classes given an input, the training comprising determining a model parameter for each context in a corpus of objects from the large set of objects by generating a term for each class and combining the terms to form the model parameter; and
  
  training a second set of model parameters that describe the likelihood of each of the objects based in part on the classes, the first set of model parameters and the second set of model parameters together describing the likelihood of an object given an input.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein assigning the objects comprises assigning words to classes.
  - 3. The method of claim 2 wherein training a first set of model parameters and a second set of model parameters comprises training a first set of language model parameters and a second set of language model parameters.
  - 4. The method of claim 3 wherein training the first set of model parameters further comprises training model parameters that are used to describe the probability that a word location in a sentence will be filled by a word of a particular class given at least one word from a context.
  - 5. The method of claim 4 wherein training the second set of model parameters further comprises training model parameters that are used to describe the probability of a particular word given its class and at least one word from a context.
  - 6. The method of claim 1 wherein training a first set of model parameters and training a second set of model parameters comprises training model parameters used in a Maximum Entropy modeling method.
  - 7. The method of claim 1 wherein training a first set of model parameters that describe the class comprises:

8. A computer-readable medium having computer-executable components for training parameters used in modeling a set of objects, the computer-executable components comprising:
- a class defining component capable of designating a class for each of the objects;
  
  a class parameter training component capable of determining parameters that are used to identify an attribute of a class in the context of at least a set of input values, the class parameter training component determining a parameter for each context in a corpus of the objects by generating a separate term for each possible class and combining the terms to form the parameter for the context in the corpus; and
  
  an object parameter training component capable of determining parameters that are used to identify a probability of an object in the context of a set of input values and a class, the object parameter training component executing in a period of time that is proportional to the number of objects in a class.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The computer-readable medium of claim 8 further comprising a corpus of words wherein the set of input values are words selected from the corpus of words.
  - 10. The computer-readable medium of claim 9 wherein the class parameter training component utilizes maximum entropy and wherein the attribute identified is a probability of a class.
  - 11. The computer-readable medium of claim 10 wherein the object parameter training component utilizes maximum entropy.
  - 12. The computer-readable medium of claim 11 wherein the set of objects is a vocabulary of words.
  - 13. The computer-readable medium of claim 8 wherein the class parameter training component comprises:

14. A method of reducing training time for a full minimum divergence model capable of providing the probability of a next word given a sequence of preceding words, the method comprising:
- dividing words into classes;
  
  training a class minimum divergence model capable of providing the probability of a class given at least a sequence of preceding words, the training comprising training a separate model parameter for each context in a training corpus by generating a term for each possible class and combining the terms to form the model parameter for the context;
  
  training a word minimum divergence model capable of providing the probability of a next word given a class and a sequence of preceding words; and
  
  using the class minimum divergence model and the word minimum divergence model together to represent the full minimum divergence model.
- View Dependent Claims (15, 16, 17)
- - 15. The method of claim 14 wherein training a class minimum divergence model comprises:
16. The method of claim 15 wherein training a super class minimum divergence model comprises:
- generating at least one additional level of classing, each level of classing having classes;
  
  training a minimum divergence model at each level of classing except the highest level of classing, each model being capable of providing the probability of a class given at least a class from a higher level of classing; and
  
  training a minimum divergence model for the highest level of classing, the model being capable of providing the probability of a class given a preceding sequence of words.
17. The method of claim 14 wherein the full minimum divergence model, the class minimum divergence model, and the word minimum divergence model are each maximum entropy models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Moore, Robert, Goodman, Joshua
Primary Examiner(s)
Broda, Samuel
Assistant Examiner(s)
PHAN, THAI Q

Application Number

US09/489,045
Time in Patent Office

1,495 Days
Field of Search

704/256, 704/246, 704/222, 704/245, 704/2, 704/235, 706/19, 706/20, 706/55
US Class Current

703/2
CPC Class Codes

G06F 40/216 using statistical methods

G10L 15/183 using context dependencies,...

Method and apparatus for fast machine training

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

64 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for fast machine training

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others