Method for constructing segmentation-based predictive models
First Claim
1. A method for a process performed on a computer for constructing segmentation-based predictive models, the method comprising:
- 1) accessing a collection of training data records comprising examples of input values that are available to the segmentation-based predictive model, together with corresponding desired output value(s) that the segmentation-based predictive model is intended to predict;
2) generating a plurality of data segments defined by tests on some, all, or none of the available inputs, and generating one or more segment models for each generated data segment, the generation method comprising;
a) for at least one generated data segment, generating a plurality of candidate data segments and associated segments models, wherein at least one segment model for at least one candidate data segment comprises a multivariate segment model; and
b) selecting, from among the plurality of candidate data segments and associated segments models for that generated data segment, a best candidate data segment and at least one associated segment model that optimizes a degree of fit measure with respect to the segment models associated with the candidate segment models; and
3) pruning the plurality of generated data segments and associated segment models by selecting a subset of generated data segments, together with one generated segment model for each data segment selected, so as to optimize a predictive accuracy of the resulting segmentation-based predictive model.
6 Assignments
0 Petitions
Accused Products
Abstract
The present invention generally relates to computer databases and, more particularly, to data mining and knowledge discovery. The invention specifically relates to a method for constructing segmentation-based predictive models, such as decision-tree classifiers, wherein data records are partitioned into a plurality of segments and separate predictive models are constructed for each segment. The present invention contemplates a computerized method for automatically building segmentation-based predictive models that substantially improves upon the modeling capabilities of decision trees and related technologies, and that automatically produces models that are competitive with, if not better than, those produced by data analysts and applied statisticians using traditional, labor-intensive statistical techniques. The invention achieves these properties by performing segmentation and multivariate statistical modeling within each segment simultaneously. Segments are constructed so as to maximize the accuracies of the predictive models within each segment. Simultaneously, the multivariate statistical models within each segment are refined so as to maximize their respective predictive accuracies.
102 Citations
12 Claims
-
1. A method for a process performed on a computer for constructing segmentation-based predictive models, the method comprising:
-
1) accessing a collection of training data records comprising examples of input values that are available to the segmentation-based predictive model, together with corresponding desired output value(s) that the segmentation-based predictive model is intended to predict; 2) generating a plurality of data segments defined by tests on some, all, or none of the available inputs, and generating one or more segment models for each generated data segment, the generation method comprising; a) for at least one generated data segment, generating a plurality of candidate data segments and associated segments models, wherein at least one segment model for at least one candidate data segment comprises a multivariate segment model; and b) selecting, from among the plurality of candidate data segments and associated segments models for that generated data segment, a best candidate data segment and at least one associated segment model that optimizes a degree of fit measure with respect to the segment models associated with the candidate segment models; and 3) pruning the plurality of generated data segments and associated segment models by selecting a subset of generated data segments, together with one generated segment model for each data segment selected, so as to optimize a predictive accuracy of the resulting segmentation-based predictive model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for a process performed on a computer for constructing segmentation-based predictive models, the method comprising:
-
generating a plurality of data segments from training data; generating one or more segment models for each generated data segment, wherein, for at least one generated data segment, a plurality of candidate data segments and associated segments models are generated and at least one segment model for at least one candidate data segment comprises a multivariate segment model; and selecting, from said plurality of candidate data segments, data segments for a predictive model, as based on which associated segments'"'"' models best fit said training data in view of; varying a number of said data segments; and using different degrees of freedom for a multivariate modeling.
-
Specification