Optimized training of linear machine learning models

US 10,318,882 B2
Filed: 09/11/2014
Issued: 06/11/2019
Est. Priority Date: 09/11/2014
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

one or more computing devices configured to;

receive, at a machine learning service of a provider network, an indication of a data source to be used for generating a linear prediction model, wherein, to generate a prediction, the linear prediction model is to utilize respective weights assigned to individual ones of a plurality of features derived from observation records of the data source, wherein the respective weights are stored in a parameter vector of the linear prediction model and updated in-memory during a machine training phase of the linear prediction model;

determine, based at least in part on examination of a particular set of observation records of the data source, respective weights for one or more features to be added to the parameter vector during a particular learning iteration of a plurality of learning iterations of the training phase of the linear prediction model, wherein the addition increases memory consumption during the machine training phase;

check, during one or more of the plurality of learning iterations, for a triggering condition to prune the parameter vector;

in response to a determination that the triggering condition has been met during the training phase,identify one or more pruning victims from a set of features whose weights are included in the parameter vector, based at least in part on a quantile analysis of the weights, wherein the quantile analysis is performed without a sort operation; and

remove at least a particular weight corresponding to a particular pruning victim of the one or more pruning victims from the parameter vector, wherein the removal reduces memory consumption during the training phase; and

generate, during a post-training-phase prediction run of the linear prediction model, a prediction using at least one feature for which a weight is determined after the particular weight of the particular pruning victim is removed from the parameter vector.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An indication of a data source to be used to train a linear prediction model is obtained. The model is to generate predictions using respective parameters assigned to a plurality of features derived from observation records of the data source. The parameter values are stored in a parameter vector. During a particular learning iteration of the training phase of the model, one or more features for which parameters are to be added to the parameter vector are identified. In response to a triggering condition, parameters for one or more features are removed from the parameter vector based on an analysis of relative contributions of the features represented in the parameter vector to the model'"'"'s predictions. After the parameters are removed, at least one parameter is added to the parameter vector.

99 Citations

View as Search Results

21 Claims

1. A system, comprising:
- one or more computing devices configured to;
  
  receive, at a machine learning service of a provider network, an indication of a data source to be used for generating a linear prediction model, wherein, to generate a prediction, the linear prediction model is to utilize respective weights assigned to individual ones of a plurality of features derived from observation records of the data source, wherein the respective weights are stored in a parameter vector of the linear prediction model and updated in-memory during a machine training phase of the linear prediction model;
  
  determine, based at least in part on examination of a particular set of observation records of the data source, respective weights for one or more features to be added to the parameter vector during a particular learning iteration of a plurality of learning iterations of the training phase of the linear prediction model, wherein the addition increases memory consumption during the machine training phase;
  
  check, during one or more of the plurality of learning iterations, for a triggering condition to prune the parameter vector;
  
  in response to a determination that the triggering condition has been met during the training phase,identify one or more pruning victims from a set of features whose weights are included in the parameter vector, based at least in part on a quantile analysis of the weights, wherein the quantile analysis is performed without a sort operation; and
  
  remove at least a particular weight corresponding to a particular pruning victim of the one or more pruning victims from the parameter vector, wherein the removal reduces memory consumption during the training phase; and
  
  generate, during a post-training-phase prediction run of the linear prediction model, a prediction using at least one feature for which a weight is determined after the particular weight of the particular pruning victim is removed from the parameter vector.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system as recited in claim 1, wherein the triggering condition is based at least in part on a population of the parameter vector.
  - 3. The system as recited in claim 1, wherein the triggering condition is based at least in part on a goal indicated by a client.
  - 4. The system as recited in claim 1, wherein the one or more computing devices are further configured to:
    - during a subsequent learning iteration of the plurality of learning iterations, performed after the particular learning iteration,determine that a weight for the particular pruning victim is to be re-added to the parameter vector; and
      
      add the weight corresponding to the particular pruning victim to the parameter vector.
  - 5. The system as recited in claim 1, wherein a first feature of the one or more features whose weights are to be added to the parameter vector during the particular learning iteration is derived from one or more variables of the observation records of the data source via a transformation that comprises a use of one or more of:
    - (a) a quantile bin function, (b) a Cartesian product function, (c) a bi-gram function, (d) an n-gram function, (e) an orthogonal sparse bigram function, (f) a calendar function, (g) an image processing function, (h) an audio processing function, (i) a bio-informatics processing function, (j) a natural language processing function or (k) a video processing function.

6. A method, comprising:
- performing, by one or more computing devices;
  
  receiving an indication of a data source to be used for training a machine learning model, wherein, to generate a prediction, the machine learning model is to utilize respective parameters assigned to individual ones of a plurality of features derived from observation records of the data source, wherein the respective parameters are stored in a parameter vector of the machine learning model and updated in-memory during a training phase of the machine learning model;
  
  identifying one or more features for which respective parameters are to be added to the parameter vector during a particular learning iteration of a plurality of learning iterations of the training phase of the machine learning model, wherein the addition increases memory consumption during the training phase;
  
  checking, during one or more of the plurality of learning iterations, for a triggering condition to prune the parameter vector;
  
  in response to determining that the triggering condition has been met in the training phase, removing respective parameters of one or more pruning victim features from the parameter vector, wherein the removal reduces memory consumption during the training phase, and wherein the one or more pruning victim features are selected based at least in part on an analysis of relative contributions of features whose parameters are included in the parameter vector to predictions made using the machine learning model; and
  
  generating, during a post-training-phase prediction run of the machine learning model, a particular prediction using at least one feature for which a parameter is determined after the one or more pruning victim features are selected.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 7. The method as recited in claim 6, wherein the analysis of relative contributions comprises a quantile analysis of weights included in the parameter vector.
  - 8. The method as recited in claim 6, wherein the analysis of relative contributions (a) does not comprise a sort operation and (b) does not comprise copying values of the parameters included in the parameter vector.
  - 9. The method as recited in claim 6, wherein said determining that the triggering condition has been met comprises determining that a population of the parameter vector exceeds a threshold.
  - 10. The method as recited in claim 6, wherein the triggering condition is based at least in part on a resource capacity constraint of a server of a machine learning service.
  - 11. The method as recited in claim 6, wherein the triggering condition is based at least in part on a goal indicated by a client.
  - 12. The method as recited in claim 6, further comprising performing, by the one or more computing devices:
    - during a subsequent learning iteration of the plurality of learning iterations, performed after the particular learning iteration,determining that a parameter for a particular feature which was previously selected as a pruning victim feature is to be re-added to the parameter vector; and
      
      adding the parameter for the particular feature to the parameter vector.
  - 13. The method as recited in claim 6, wherein a first feature of the one or more features for which respective parameters are to be added to the parameter vector during the particular learning iteration is determined from one or more variables of observation records of the data source via a transformation that comprises a use of one or more of:
    - (a) a quantile bin function, (b) a Cartesian product function, (c) a bi-gram function, (d) an n-gram function, (e) an orthogonal sparse bigram function, (f) a calendar function, (g) an image processing function, (h) an audio processing function, (i) a bio-informatics processing function, (j) a natural language processing function, or (k) a video processing function.
  - 14. The method as recited in claim 6, further comprising performing, by the one or more computing devices:
    - implementing a stochastic gradient descent technique to update, during the particular learning iteration, one or more previously-generated parameters included in the parameter vector.
  - 15. The method as recited in claim 6, wherein the machine learning model comprises a generalized linear model.
  - 16. The method as recited in claim 6, further comprising performing, by the one or more computing devices:
    - receiving, via a programmatic interface of a machine learning service implemented at a provider network, wherein the machine learning service comprises a plurality of training servers at one or more data centers, a client request indicating the data source; and
      
      assigning, to a particular training server of the plurality of training servers by a job scheduler of the machine learning service, asynchronously with respect to said receiving the client request, a job comprising the plurality of learning iterations.

17. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors implements a model generator of a machine learning service, wherein the model generator is configured to:
- determine a data source to be used for generating a model, wherein, to generate a prediction, the model is to utilize respective parameters assigned to individual ones of a plurality of features derived from observation records of the data source, wherein the respective parameters are stored in a parameter vector of the model and updated in-memory during a training phase of the model;
  
  identify one or more features for which parameters are to be added to the parameter vector during a particular learning iteration of a plurality of learning iterations of the training phase of the model, wherein the addition increases memory consumption during the training phase;
  
  check, during one or more of the plurality of learning iterations, for a triggering condition to prune the parameter vector;
  
  in response to a determination that the triggering condition has been met, remove respective parameters assigned to one or more pruning victim features from the parameter vector, wherein the removal reduces memory consumption during the training phase, and wherein the one or more pruning victim features are selected based at least in part on an analysis of relative contributions of features whose parameters are included in the parameter vector to predictions made using the model; and
  
  add, subsequent to a removal from the parameter vector of at least one parameter assigned to a pruning victim feature, at least one parameter to the parameter vector.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The non-transitory computer-accessible storage medium as recited in claim 17, wherein the analysis of relative contributions comprises a determination of a deviation of a particular parameter value included in the parameter vector from an a priori parameter value.
  - 19. The non-transitory computer-accessible storage medium as recited in claim 18, wherein the particular parameter value comprises a probability distribution, and wherein the determination of the deviation comprises an estimation of a Kullback-Leibler (KL) divergence.
  - 20. The non-transitory computer-accessible storage medium as recited in claim 17, wherein to determine whether the triggering condition has been met, the model generator is configured to determine whether a population of the parameter vector exceeds a threshold.
  - 21. The non-transitory computer-accessible storage medium as recited in claim 17, wherein the data source comprises a source of a stream of observation records transmitted to a network endpoint of a machine learning service.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Brueckner, Michael, Blick, Daniel
Primary Examiner(s)
Waldron, Scott A.
Assistant Examiner(s)
Seck, Ababacar

Application Number

US14/484,201
Publication Number

US 20160078361A1
Time in Patent Office

1,734 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/2411   based on the proximity to a...

G06N 20/00   Machine learning

G06N 5/025   Extracting rules from data

G06N 7/01   Probabilistic graphical mod...

H04L 67/10   in which an application is ...

Optimized training of linear machine learning models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

99 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Optimized training of linear machine learning models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

99 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links