Discriminative training using boosted lasso

US 20080147579A1
Filed: 12/14/2006
Published: 06/19/2008
Est. Priority Date: 12/14/2006
Status: Abandoned Application

First Claim

Patent Images

1. A method comprising:

setting a limit for the amount by which feature weights can be changed during a single iteration of training of feature weights in a language model;

selecting a feature weight from the set of feature weights;

computing a best value for the selected feature weight, wherein the best value comprises a value that results in the greatest change in a function, and wherein the best value differs from a previous value for the selected feature weight by a change amount;

determining if the absolute value of the change amount is less than the limit;

selecting the best value for the selected feature weight instead of a step-change value for the selected feature weight as a new value for the selected feature weight if the absolute value of the change amount is less than the limit, wherein the step-change value is formed by increasing the absolute value of the previous value of the feature weight by the limit; and

storing the new value for the feature weight as part of a current set of feature weights for the language model.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Word sequences that contain a selected feature are identified using an index that comprises a separate entry for each of a collection of features in the language model, each entry identifying word sequences that contain the feature. The identified word sequences are used to compute a best value for a feature weight of the selected feature. A selection is made between the best value and a step-change value for the feature weight to produce a new value for the feature weight. The new value for the feature weight is then stored in a current set of feature weights for the language model.

Citations

20 Claims

1. A method comprising:
- setting a limit for the amount by which feature weights can be changed during a single iteration of training of feature weights in a language model;
  
  selecting a feature weight from the set of feature weights;
  
  computing a best value for the selected feature weight, wherein the best value comprises a value that results in the greatest change in a function, and wherein the best value differs from a previous value for the selected feature weight by a change amount;
  
  determining if the absolute value of the change amount is less than the limit;
  
  selecting the best value for the selected feature weight instead of a step-change value for the selected feature weight as a new value for the selected feature weight if the absolute value of the change amount is less than the limit, wherein the step-change value is formed by increasing the absolute value of the previous value of the feature weight by the limit; and
  
  storing the new value for the feature weight as part of a current set of feature weights for the language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 12)
- - 2. The method of claim 1 wherein computing the best value for the selected feature weight comprises:
    - identifying word sequences that contain the selected feature;
      
      computing at least two word sequence exponential losses based on the identified word sequences; and
      
      using the word sequence exponential losses to compute the best value.
  - 3. The method of claim 2 wherein identifying word sequences comprises applying the feature associated with the selected feature weight to an index that has an entry for each feature, wherein each entry identifies candidate word sets in which the feature appears, wherein the candidate word sets comprise a plurality of word sequences.
  - 4. The method of claim 2 wherein identifying word sequences comprises applying the feature associated with the selected feature weight to an index that has an entry for each feature, wherein each entry identifies word sequences in which the feature appears.
  - 5. The method of claim 1 further comprising:
    - forming a first set of feature weights by changing a value for a first feature weight in the current set of feature weights, the first feature weight being changed by the limit amount such that the absolute value of the first feature weight decreases;
      
      determining a first value for a loss function based on the first set of feature weights;
      
      forming a second set of feature weights by changing a second feature weight in the current set of feature weights, the second feature weight being changed by the limit amount such that the absolute value of the second feature weight decreases;
      
      determining a second value for the loss function based on the second set of feature weights; and
      
      selecting one of the sets of feature weights based on the values for the loss function.
  - 6. The method of claim 5 further comprising:
    - determining a current value for a lasso loss function based on the current set of feature weights, wherein the lasso loss function is a combination of the loss function and a penalty based on the size of the feature weights;
      
      determining an updated value for the lasso loss function based on the selected set of feature weights;
      
      if the current value of the lasso loss function is greater than the updated lasso loss function, setting the selected set of feature weights as the current set of feature weights.
  - 7. The method of claim 6 wherein if the current value of the lasso loss function is less than the updated lasso loss function, keeping the current set of feature weights as the current set of feature weights.
  - 8. The method of claim 1 further comprising initializing a value for a base feature weight to minimize a loss function with the values for all other feature weights set to zero.
  - 12. The computer-readable medium of claim 1 wherein selecting one of the best value and the step-change value comprises selecting the value with the smallest absolute value.

9. A computer-readable medium having computer-executable instructions for performing steps comprising:
- selecting a feature of a language model;
  
  identifying word sequences that contain the feature using an index that comprises a separate entry for each of a collection of features in the language model, each entry identifying word sequences that contain the feature;
  
  using the identified word sequences to compute a best value for a feature weight of the selected feature;
  
  selecting one of the best value and a step-change value for the feature weight as a new value for the feature weight; and
  
  storing the new value for the feature weight in a current set of feature weights for the language model.
- View Dependent Claims (10, 11, 13, 14, 15)
- - 10. The computer-readable medium of claim 9 wherein at least one entry identifies a candidate word set, wherein the candidate word set comprises at least one word sequence that contains the selected feature.
  - 11. The computer-readable medium of claim 10 wherein at least one entry comprises a list of individual word sequences that each contain the selected feature.
  - 13. The computer-readable medium of claim 9 wherein before storing the updated value:
    - computing an exponential loss based on the new value for the feature weight;
      
      comparing the exponential loss to an exponential loss computed based on a current value for the weight and a possible new value for a feature weight associated with another feature; and
      
      determining whether to store the new value based on the comparison.
  - 14. The computer-readable medium of claim 9 wherein selecting a feature further comprises excluding a base feature from being selected, the feature having a base feature weight that is set when feature weights for all other features are equal to zero.
  - 15. The computer-readable medium of claim 9 further comprising determining whether to reduce the absolute value of a feature weight based on a lasso loss function that includes a penalty factor that is based on the absolute value of feature weights.

16. A method comprising:
- applying a feature for a language model to an index comprising a separate entry for each feature of the language model to identify a plurality of word sequences that contain the feature;
  
  using features contained in at least one of the identified word sequences to compute a word sequence exponential loss function;
  
  using the word sequence exponential loss function to determine a value for a feature weight for the feature; and
  
  storing the value for the feature weight as part of a language model.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16 wherein using the word sequence exponential loss function to determine a value comprises using the word sequence exponential loss function to determine a value that results in the largest possible change in an exponential loss function.
  - 18. The method of claim 17 further comprising determining if the determined value for the feature weight has an absolute value that is greater than an absolute value of a step-change value for the feature weight and storing the step-change value instead of the determined value if the absolute value of the determined value is greater than the absolute value of the step-change value.
  - 19. The method of claim 16 wherein each entry in the index provides a list of candidate word sets, each candidate word set comprising a plurality of word sequences wherein at least one of the word sequences contains the feature for the entry.
  - 20. The method of claim 16 further comprising changing a feature weight to reduce its absolute value by the maximum value, determining that the change in the feature weight reduces a lasso loss function that is based in part on the absolute values of feature weights, and storing the change in the feature weight as part of the language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Gao, Jianfeng

Application Number

US11/638,887
Publication Number

US 20080147579A1
Time in Patent Office

Days
Field of Search
US Class Current

706/25
CPC Class Codes

G10L 15/197 Probabilistic grammars, e.g...

Discriminative training using boosted lasso

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Discriminative training using boosted lasso

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links