Method and system for predicting customer behavior based on data network geography
First Claim
1. A method of selecting data sets for use with a predictive algorithm based on data network geographical information, comprising:
- generating a first distribution of a training data set;
generating a second distribution of a testing data set;
comparing the first distribution and the second distribution to identify a discrepancy between the first distribution and the second distribution with respect to data network geographical information; and
modifying selection of entries in one or more of the training data set and the testing data set based on the discrepancy between the first distribution and the second distribution.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for predicting customer behavior based on the geography of a data network are provided. Furthermore, a method and system for evaluating the training of a predictive algorithm to determine if the algorithm does not adequately take into consideration the influences of data network geography are also provided. The method and system generate frequency distributions of a customer database data set, training data set and testing data set and compare the frequency distributions of data network geographical characteristics to determine if there are discrepancies. If the discrepancies are above a predetermined tolerance, one or more of the data sets may not be representative of the customer database taking into account data network geographical influences on customer behavior. Thus, recommendations for improving the training data set and/or testing data set are then provided such that the data set is more representative of the data network geographical influences. Once trained, the predictive algorithm may be utilized to predict customer behavior taking into account the influences of data network geography.
60 Citations
43 Claims
-
1. A method of selecting data sets for use with a predictive algorithm based on data network geographical information, comprising:
-
generating a first distribution of a training data set;
generating a second distribution of a testing data set;
comparing the first distribution and the second distribution to identify a discrepancy between the first distribution and the second distribution with respect to data network geographical information; and
modifying selection of entries in one or more of the training data set and the testing data set based on the discrepancy between the first distribution and the second distribution. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
-
15. An apparatus for selecting data sets for use with a predictive algorithm based on data network geographical information, comprising:
-
a statistical engine; and
a comparison engine coupled to the statistical engine, wherein the statistical engine generates a first distribution of a training data set and a second distribution of a testing data set, the comparison engine compares the first distribution and the second distribution to identify a discrepancy between the first distribution and the second distribution with respect to data network geographical information, and modifies selection of entries in one or more of the training data set and the testing data set based on the discrepancy between the first distribution and the second distribution.
-
-
29. A computer program product in a computer readable medium for selecting data sets for use with a predictive algorithm based on data network geographical information, comprising:
-
first instructions for generating a first distribution of a training data set;
second instructions for generating a second distribution of a testing data set;
third instructions for comparing the first distribution and the second distribution to identify a discrepancy between the first distribution and the second distribution with respect to data network geographical information; and
fourth instructions for modifying selection of entries in one or more of the training data set and the testing data set based on the discrepancy between the first distribution and the second distribution.
-
-
41. A method of predicting customer behavior based on data network geographical influences, comprising:
-
obtaining data network geographical information regarding a plurality of customers;
training a predictive algorithm using the data network geographical information; and
using the predictive algorithm to predict customer behavior based on the data network geographical information.
-
-
42. An apparatus for predicting customer behavior based on data network geographical influences, comprising:
-
means for obtaining data network geographical information regarding a plurality of customers;
means for training a predictive algorithm using the data network geographical information; and
means for using the predictive algorithm to predict customer behavior based on the data network geographical information.
-
-
43. A computer program product in a computer readable medium for predicting customer behavior based on data network geographical influences, comprising:
-
first instructions for obtaining data network geographical information regarding a plurality of customers;
second instructions for training a predictive algorithm using the data network geographical information; and
third instructions for using the predictive algorithm to predict customer behavior based on the data network geographical information.
-
Specification