Method and system for predicting customer behavior based on data network geography
First Claim
1. A data processing machine implemented method of selecting data sets for use with a predictive algorithm based on data network geographical information, comprising data processing machine implemented steps of:
- generating, by the data processing machine, a first statistical distribution of a training data set;
generating, by the data processing machine, a second statistical distribution of a testing data set;
using, by the data processing machine, the first statistical distribution and the second statistical distribution to identify a discrepancy between the first statistical distribution and the second statistical distribution with respect to the data network geographical information by comparing at least one of the first statistical distribution and the second statistical distribution to a statistical distribution of a customer database to determine if at least one of the training data set and the testing data set are geographically representative of a customer population represented by the customer database;
modifying, by the data processing machine, selection of entries in one or more of the training data set and the testing data set based on the discrepancy between the first statistical distribution and the second statistical distribution; and
using the modified selection of entries by the predictive algorithm.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for predicting customer behavior based on the geography of a data network are provided. Furthermore, a method and system for evaluating the training of a predictive algorithm to determine if the algorithm does not adequately take into consideration the influences of data network geography are also provided. The method and system generate frequency distributions of a customer database data set, training data set and testing data set and compare the frequency distributions of data network geographical characteristics to determine if there are discrepancies. If the discrepancies are above a predetermined tolerance, one or more of the data sets may not be representative of the customer database taking into account data network geographical influences on customer behavior. Thus, recommendations for improving the training data set and/or testing data set are then provided such that the data set is more representative of the data network geographical influences. Once trained, the predictive algorithm may be utilized to predict customer behavior taking into account the influences of data network geography.
-
Citations
29 Claims
-
1. A data processing machine implemented method of selecting data sets for use with a predictive algorithm based on data network geographical information, comprising data processing machine implemented steps of:
-
generating, by the data processing machine, a first statistical distribution of a training data set; generating, by the data processing machine, a second statistical distribution of a testing data set; using, by the data processing machine, the first statistical distribution and the second statistical distribution to identify a discrepancy between the first statistical distribution and the second statistical distribution with respect to the data network geographical information by comparing at least one of the first statistical distribution and the second statistical distribution to a statistical distribution of a customer database to determine if at least one of the training data set and the testing data set are geographically representative of a customer population represented by the customer database; modifying, by the data processing machine, selection of entries in one or more of the training data set and the testing data set based on the discrepancy between the first statistical distribution and the second statistical distribution; and using the modified selection of entries by the predictive algorithm. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An apparatus for selecting data sets for use with a predictive algorithm based on data network geographical information, comprising:
-
a statistical engine; a comparison engine coupled to the statistical engine, wherein the statistical engine generates a first statistical distribution of a training data set and a second distribution of a testing data set, the comparison engine uses the first statistical distribution and the second distribution to identify a discrepancy between the first statistical distribution and the second distribution with respect to the data network geographical information by comparing at least one of the first statistical distribution and the second statistical distribution to a statistical distribution of a customer database to determine if at least one of the training data set and the testing data set are geographically representative of a customer population represented by the customer database, modifies selection of entries in one or more of the training data set and the testing data set based on the discrepancy between the first statistical distribution and the second distribution, and provides the modified selection of entries for use by the predictive algorithm; and a predictive algorithm device that uses the modified selection of entries and the predictive algorithm. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A data processing machine implemented method of predicting customer behavior based on data network geographical influences, comprising data processing machine implemented steps of:
obtaining data network geographical information regarding a plurality of customers, the data network geographic information comprising frequency distributions of both (i) number of data network links between a customer geographical location and one or more web site data network geographical locations, and (ii) size of a click stream for arriving at the one or more web site data network geographical locations;
training a predictive algorithm using the data network geographical information; and
using the predictive algorithm to predict customer behavior based on the data network geographical information.
-
28. An apparatus for predicting customer behavior based on data network geographical influences, comprising:
-
means for obtaining data network geographical information regarding a plurality of customers, the data network geographic information comprising frequency distributions of both (i) number of data network links between a customer geographical location and one or more web site data network geographical locations, and (ii) size of a click stream for arriving at the one or more web site data network geographical locations; means for training a predictive algorithm using the data network geographical information; and means for using the predictive algorithm to predict customer behavior based on the data network geographical information.
-
-
29. A computer program product in a computer readable medium comprising instructions for enabling a data processing machine to predict customer behavior based on data network geographical influences, comprising:
-
first instructions for obtaining data network geographical information regarding a plurality of customers, the data network geographic information comprising frequency distributions of both (i) number of data network links between a customer geographical location and one or more web site data network geographical locations, and (ii) size of a click stream for arriving at the one or more web site data network geographical locations; second instructions for training a predictive algorithm using the data network geographical information; and third instructions for using the predictive algorithm to predict customer behavior based on the data network geographical information.
-
Specification