RE-SIZING DATA PARTITIONS FOR ENSEMBLE MODELS IN A MAPREDUCE FRAMEWORK
First Claim
1. A method comprising:
- determining an initial number of base model partitions of data from a plurality of data sources;
determining an initial base model partition size based at least in part on the initial number of base model partitions;
evaluating the initial base model partition size at least in part with reference to at least one base model partition size reference;
determining a finalized number of base model partitions based at least in part on the initial base model partition size;
determining a revised base model partition size; and
generating revised base models based at least in part on the revised base model partition size, wherein generating the revised base models comprises using a predictive modeling framework to randomly assign input data records from the plurality of data sources into the finalized number of base model partitions.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are described for revising data partition size for use in generating predictive models. In one example, a method includes determining an initial number of base model partitions of data from a plurality of data sources; determining an initial base model partition size based at least in part on the initial number of base model partitions; and evaluating the initial base model partition size at least in part with reference to at least one base model partition size reference. The method further includes determining a finalized number of base model partitions based at least in part on the initial base model partition size; determining a revised base model partition size; and generating revised base models based at least in part on the revised base model partition size, including using a predictive modeling framework to randomly assign input data records from the plurality of data sources into the base model partitions.
16 Citations
13 Claims
-
1. A method comprising:
-
determining an initial number of base model partitions of data from a plurality of data sources; determining an initial base model partition size based at least in part on the initial number of base model partitions; evaluating the initial base model partition size at least in part with reference to at least one base model partition size reference; determining a finalized number of base model partitions based at least in part on the initial base model partition size; determining a revised base model partition size; and generating revised base models based at least in part on the revised base model partition size, wherein generating the revised base models comprises using a predictive modeling framework to randomly assign input data records from the plurality of data sources into the finalized number of base model partitions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
Specification