Concurrent binning of machine learning data
First Claim
1. A system, comprising:
- one or more computing devices configured to;
receive, at a machine learning service of a provider network, an indication of a data source comprising observation records to be used to generate a model;
identify one or more variables of the observation records as candidates for quantile binning transformations;
determine a particular concurrent binning plan for at least a particular variable of the one or more variables, wherein, in accordance with the particular concurrent binning plan, a plurality of quantile binning transformations are applied to the particular variable during a training phase of the model, wherein the plurality of quantile binning transformations include a first quantile binning transformation with a first bin count and a second quantile binning transformation with a different bin count;
generate, during the training phase, a parameter vector comprising respective initial weight values corresponding to a plurality of binned features obtained as a result of an implementation of the particular concurrent binning plan, including a first binned feature obtained using the first quantile binning transformation and a second binned feature obtained using the second quantile binning transformation;
reduce, during the training phase, at least one weight value corresponding to a particular binned feature of the plurality of binned features in accordance with a selected optimization strategy; and
obtain, during a post-training-phase prediction run of the model, a particular prediction using at least one of;
the first binned feature or the second binned feature.
1 Assignment
0 Petitions
Accused Products
Abstract
Variables of observation records to be used to generate a machine learning model are identified as candidates for quantile binning transformations. In accordance with a particular concurrent binning plan generated for a particular variable, a plurality of quantile binning transformations are applied to the particular variable, including a first transformation with a first bin count and a second transformation with a different bin count. The first and second transformations result in the inclusion of respective parameters or weights for binned features in a parameter vector of the model. In a post-training phase run of the model, at least one parameter corresponding to a binned feature is used to generate a prediction.
-
Citations
20 Claims
-
1. A system, comprising:
one or more computing devices configured to; receive, at a machine learning service of a provider network, an indication of a data source comprising observation records to be used to generate a model; identify one or more variables of the observation records as candidates for quantile binning transformations; determine a particular concurrent binning plan for at least a particular variable of the one or more variables, wherein, in accordance with the particular concurrent binning plan, a plurality of quantile binning transformations are applied to the particular variable during a training phase of the model, wherein the plurality of quantile binning transformations include a first quantile binning transformation with a first bin count and a second quantile binning transformation with a different bin count; generate, during the training phase, a parameter vector comprising respective initial weight values corresponding to a plurality of binned features obtained as a result of an implementation of the particular concurrent binning plan, including a first binned feature obtained using the first quantile binning transformation and a second binned feature obtained using the second quantile binning transformation; reduce, during the training phase, at least one weight value corresponding to a particular binned feature of the plurality of binned features in accordance with a selected optimization strategy; and obtain, during a post-training-phase prediction run of the model, a particular prediction using at least one of;
the first binned feature or the second binned feature.- View Dependent Claims (2, 3, 4, 5)
-
6. A method, comprising:
performing, by one or more computing devices; implementing a respective concurrent binning plan for one or more variables of observation records to be used to generate a machine learning model, wherein, in accordance with a particular concurrent binning plan, a plurality of quantile binning transformations are applied to at least a particular variable of the one or more variables, wherein the plurality of quantile binning transformations include a first quantile binning transformation with a first bin count and a second quantile binning transformation with a different bin count; determining respective parameter values associated with a plurality of binned features, including a first binned feature obtained using the first quantile binning transformation and a second binned feature obtained using the second quantile binning transformation; and generating, during a post-training-phase prediction run of the machine learning model, a particular prediction using a parameter value corresponding to at least one of;
the first binned feature or the second binned feature.- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors implements a model generator of a machine learning service, wherein the model generator is configured to:
-
identify one or more variables of observation records to be used to generate a machine learning model as candidates for quantile binning transformations; determine a respective concurrent binning plan for the one or more variables, wherein, in accordance with a particular concurrent binning plan for at least a particular variable, a plurality of quantile binning transformations are applied to the particular variable, wherein the plurality of quantile binning transformations include a first quantile binning transformation with a first bin count and a second quantile binning transformation with a different bin count; and include, within a parameter vector of the machine learning model, respective parameters for a plurality of binned features, including a first parameter for a first binned feature obtained from the first quantile binning transformation and a second parameter for a second binned feature obtained from the first quantile binning feature, wherein at least one binned feature of the first and second binned features is used to generate a prediction in a post-training-phase execution of the machine learning model. - View Dependent Claims (17, 18, 19, 20)
-
Specification