Exponential priors for maximum entropy models

US 7,340,376 B2
Filed: 07/21/2005
Issued: 03/04/2008
Est. Priority Date: 01/28/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method for maximizing probability values to facilitate training a machine learning system comprising:

receiving a data set;

determining an Exponential distribution as an Exponential prior, comprising;

graphing a distribution of parameter values that have at least 30 training instances; and

determining the Exponential prior by examining the distribution of parameter values;

defining one or more parameters; and

training a model based at least in part upon a subset of the data set, the Exponential prior, and the one or more parameters.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject invention provides for systems and methods that facilitate optimizing one or mores sets of training data by utilizing an Exponential distribution as the prior on one or more parameters in connection with a maximum entropy (maxent) model to mitigate overfitting. Maxent is also known as logistic regression. More specifically, the systems and methods can facilitate optimizing probabilities that are assigned to the training data for later use in machine learning processes, for example. In practice, training data can be assigned their respective weights and then a probability distribution can be assigned to those weights.

72 Citations

View as Search Results

20 Claims

1. A computer implemented method for maximizing probability values to facilitate training a machine learning system comprising:
- receiving a data set;
  
  determining an Exponential distribution as an Exponential prior, comprising;
  
  graphing a distribution of parameter values that have at least 30 training instances; and
  
  determining the Exponential prior by examining the distribution of parameter values;
  
  defining one or more parameters; and
  
  training a model based at least in part upon a subset of the data set, the Exponential prior, and the one or more parameters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, the act of determining an Exponential prior further comprising at least one of the following acts:
    - providing a relatively large data set; and
      
      training a model using the large data set and a Gaussian prior.
  - 3. The method of claim 1, the Exponential prior being determined based at least in part upon a particular feature of interest.
  - 4. The method of claim 3, the feature is an IP address.
  - 5. The method of claim 3, the feature is an email address.
  - 6. The method of claim 3, the feature is subject line content.
  - 7. The method of claim 3, the feature is a message size.
  - 8. The method of claim 3, the feature is body text of a message.
  - 9. The method of claim 3, the feature is an embedded image of a message.

10. A computer implemented method for maximizing probability values to facilitate training a machine learning system comprising:
- identifying one or more parameters from a data set, each parameter comprises at least 30 teaching instances;
  
  plotting a distribution of teaching instances for each of the one or more parameters identified;
  
  establishing an Exponential distribution as an Exponential prior for each of the one or more parameters by examining the distribution of teaching instances; and
  
  teaching a model based at least in part upon a subset of the data set, the Exponential prior, and the one or more parameters.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The method of claim 10, the act of teaching the model further comprising employing a double sided style.
  - 12. The method of claim 11, further comprising defining two weights as 0 and 1, respectively for the double sided style.
  - 13. The method of claim 12, the act of establishing being based at least in part upon a particular feature of interest.
  - 14. The method of claim 13, the feature is at least one of an IP address, subject line content, a message size, body text of a message, or an embedded image of a message.
  - 15. The method of claim 13, further comprising assigning a λ
    - value weight to at least one of a word, word pair, word phrase, or text or image data for the feature.
  - 16. The method of claim 13, further comprising assigning a σ
    - ²variance to the Exponential prior.
  - 17. The method of claim 16, the act of assigning the σ
    - ²variance is based at least in part upon a type of the feature.
  - 18. The method of claim 10, further comprising employing at least one of cross-validation or held out data for finding a variance of the Exponential prior.
  - 19. The method of claim 18, further comprising finding the variance for minimizing entropy on the at least one of cross-validation or held out data.

20. A computer implemented method for maximizing probability values to facilitate training a machine learning system comprising:
- defining one or more parameters from a data set, each parameter comprises at least 30 training instances;
  
  mapping a distribution of training instances for each of the one or more parameters defined;
  
  determining an Exponential distribution as an Exponential prior for each of the one or more parameters by examining the distribution of training instances;
  
  computing an σ
  
  ²variance for the Exponential prior; and
  
  teaching a model based at least in part upon a subset of the data set, the σ
  
  ²variance for the Exponential prior, and the one or more parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Goodman, Joshua T.
Primary Examiner(s)
Ramos-Feliciano; Eliseo
Assistant Examiner(s)
Huynh; Phuong

Application Number

US11/186,287
Publication Number

US 20050256685A1
Time in Patent Office

957 Days
Field of Search

702/181, 702/180, 702/182, 702/184, 702/179, 702/188, 702/189, 709/206, 709/207, 709/224, 709/204
US Class Current

702/181
CPC Class Codes

G06F 18/21   Design or setup of recognit...

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

Exponential priors for maximum entropy models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

72 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Exponential priors for maximum entropy models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

72 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links