Training machine learning by sequential conditional generalized iterative scaling

US 7,107,207 B2
Filed: 06/19/2002
Issued: 09/12/2006
Est. Priority Date: 06/19/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A system for training a machine learning system, comprising:

an expected value update component that, for a plurality of outputs and for a plurality of instances in which a single feature function is non-zero, modifies an expected value based, at least in part, upon the single feature function of an input vector and an output value, a sum of lambda variable and a normalization variable;

an error calculator that calculates an error based, at least in part, upon the expected value and an observed value, the error calculation further employing, at least in part, the following equation;

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method facilitating training machine learning systems utilizing sequential conditional generalized iterative scaling is provided. The invention includes an expected value update component that modifies an expected value based, at least in part, upon a feature function of an input vector and an output value, a sum of lambda variable and a normalization variable. The invention further includes an error calculator that calculates an error based, at least in part, upon the expected value and an observed value. The invention also includes a parameter update component that modifies a trainable parameter based, at least in part, upon the error. A variable update component that updates at least one of the sum of lambda variable and the normalization variable based, at least in part, upon the error is also provided.

39 Citations

View as Search Results

20 Claims

1. A system for training a machine learning system, comprising:
- an expected value update component that, for a plurality of outputs and for a plurality of instances in which a single feature function is non-zero, modifies an expected value based, at least in part, upon the single feature function of an input vector and an output value, a sum of lambda variable and a normalization variable;
  
  an error calculator that calculates an error based, at least in part, upon the expected value and an observed value, the error calculation further employing, at least in part, the following equation;
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, the error calculation further employing, at least in part, the following equation:
  - 3. The system of claim 1, modification of the expected value being based, at least in part, upon the following equation:
    - expected value=expected value+ƒ
      
      _i({overscore (x)}_j, y)e^s[j,y]/z[j]where ƒ
      
      _i({overscore (x)}_j, y) is the feature function,{overscore (x)}_jis the input vector,y is the output value,s[j,y] is the sum of lambda variable, and,z[j] is the normalization variable.
  - 4. The system of claim 1, the error being based, at least in part, upon the following equation:
  - 5. The system of claim 1, modification of the trainable parameter being based, at least in part, upon the following equation:
    - λ
      
      _i=λ
      
      _i+δ
      
      _iwhere λ
      
      _iis the trainable parameter, and,δ
      
      _iis the error.
  - 6. The system of claim 1, updating of the sum of lambda variable and the normalization variable being based, at least in part, upon the following equations:
    - z[j]=z[j]−
      
      e^s[j,y]
      s[j,y]=s[j,y]+δ
      
      _i
      z[j]=z[j]+e^s[j,y]where s[j,y] is the sum of lambda variable,z[j] is the normalization variable, and,δ
      
      _iis the error.
  - 7. The system of claim 1, further comprising a training data store that stores at least one of the observed value and the input vector.
  - 8. The system of claim 7, at least one of the observed value and the input vector being stored in a sparse representation.
  - 9. The system of claim 1, further comprising a parameter store that stores at least one trainable parameter.
  - 10. A machine learning system trained using the system of claim 1.

11. A system for training a machine learning system, comprising:
- an expected value update component that, for a plurality of outputs and for a plurality of instances in which a single feature function is non-zero, modifies an expected value based, at least in part, upon the single feature function of an input vector and an output value, a sum of lambda variable and a normalization variable, modification of the expected value being based, at least in part, upon the following equation;
  
  expected value=expected value+ƒ
  
  _i({overscore (x)}_j, y)e^s[j,y]/z[j]where ƒ
  
  _i({overscore (x)}_j, y) is the feature function,{overscore (x)}_jis the input vector,y is the output value,s[j,y] is the sum of lambda variable, and,z[j] is the normalization variable;
  
  an error calculator that calculates an error based, at least in part, upon the expected value and an observed value;
  
  a parameter update component that modifies class trainable parameters or word trainable parameters based, at least in part, upon the error; and
  
  ,a variable update component that, for the plurality of outputs and for the plurality of instances in which the feature function is non-zero, sequentially updates at least one of the sum of lambda variable and the normalization variable based, at least in part, upon the error.
- View Dependent Claims (12, 13)
- - 12. The system of claim 11, the class trainable parameters being trained before the word trainable parameters are trained.
  - 13. The system of claim 11, further comprising a training data store that stores at least one of the observed value and the input vector.

14. A method for training a machine learning system, comprising:
- for each feature function, updating an expected value based, at least in part, upon a feature function of an input vector and an output value, a sum of lambda variable and a normalization variable;
  
  for each feature function, calculating an error based, at least in part, upon the expected value and an observed value, the error calculation being based, at least in part, upon the following equation;
- View Dependent Claims (15)
- - 15. The method of claim 14, further comprising at least one of word clustering, smoothing and improved iterative scaling.

16. A method for training a machine learning system, comprising:
- updating an expected value based, at least in part, upon a feature function of an input vector and an output value, a sum of lambda variable and a normalization variable, for each output, for each instance that the feature function is not zero;
  
  calculating an error based, at least in part, upon the expected value and an observed value, the error calculation further employing, at least in part, the following equation;
- View Dependent Claims (17)
- - 17. The method of claim 16, further comprising at least one of the following acts:
    - performing general initialization;
      
      resetting an expected value;
      
      determining whether there are more outputs; and
      
      ,determining whether there are more feature functions.

18. A computer implemented method for training a learning system, comprising the following computer executable acts:
- training trainable class parameters based, at least in part, upon sequential conditional generalized iterative scaling an input vector, an output value, and calculating an error employing, at least in part, the following equation;

19. A computer readable medium storing computer executable components of a system facilitating training of a machine learning system, comprising:
- an expected value update component that modifies an expected value for a plurality of outputs and for a plurality of instances in which a single feature function is non-zero based, at least in part, upon the single feature function of an input vector and an output value, a sum of lambda variable and a normalization variable;
  
  an error calculator component that calculates an error based, at least in part, upon the expected value and an observed value;
  
  a parameter update component that modifies a trainable parameter based, at least in part, upon the error; and
  
  ,a variable update component that sequentially updates at least one of the sum of lambda variable and the normalization variable for the plurality of outputs and for the plurality of instances in which the feature function is non-zero based, at least in part, upon the error, the updating of the sum of lambda variable and the normalization variable being based, at least in part, upon the following equations;
  
  z[j]=z[j]−
  
  e^s[j,y]
  s[j,y]=s[j,y]+δ
  
  _i
  z[j]=z[j]+e^s[j,y]where s[j,y] is the sum of lambda variable,z[j] is the normalization variable, and,δ
  
  _iis the error.

20. A training system for a machine learning system, comprising:
- means for modifying an expected value for a plurality of outputs and for a plurality of instances in which a feature function is non-zero based, at least in part, upon the feature function of an input vector and an output value, a sum of lambda variable and a normalization variable;
  
  means for calculating an error based, at least in part, upon the expected value and an observed value, the means for error calculation further employing, at least in part, the following equation;

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Goodman, Joshua Theodore
Primary Examiner(s)
Storm, Donald L.

Application Number

US10/175,430
Publication Number

US 20030236662A1
Time in Patent Office

1,546 Days
Field of Search

None
US Class Current

704/9
CPC Class Codes

G06N 20/00 Machine learning

G10L 15/063 Training

Training machine learning by sequential conditional generalized iterative scaling

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

39 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Training machine learning by sequential conditional generalized iterative scaling

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links