System and method for efficiently generating models for targeting products and promotions using classification method by choosing points to be labeled
First Claim
Patent Images
1. A method of using a computer for targeting products and promotions to candidate sets of customers having attributes, said method iteratively implementing phases with each phase comprising the steps of:
- a) storing unlabeled customer data in a storage device, each unlabeled customer data having one or more customer attributes;
b) implementing a means for selecting a subset of unlabeled customer data from said storage device, said selecting means responsive to received guessed labels generated for unlabeled customer data instances in said selected subset according to a first classification method and, further responsive to weights computed for unlabeled data instances using said guessed labels;
c) implementing a means for labeling the selected subset of unlabeled customer data using external information and adding said labeled data subset to a labeled data set, said labeled data set comprising one or more labeled data instances;
d) implementing a model generator device for retrieving said labeled data set and generating one or more classification models, said customer classification model generating comprising steps of;
i) initializing an iteration index r;
ii) initializing a first set of probabilities for each labeled instance in said labeled data set;
iii) choosing a sample S(r) of labeled instances from the labeled data set using said probabilities;
iv) generating a classification model M(r) for data in S(r) using said second classification method;
v) applying said classification model M(r) to the entire labeled data set;
vi) computing a second set of probabilities for including each instance;
vii) incrementing said iteration index r;
viii) repeating steps iii)–
vii) until a predetermined termination criterion is satisfied;
f) applying one or more generated classification models M(r) and said guessed labels for unlabeled data instances to compute said weights in step b); and
utilizing said weights for selecting a next subset from remaining unlabeled data stored in said storage device in a subsequent phase; and
,g) repeating step b) through f) in each phase until a termination criterion is satisfied; and
h) implementing a device for combining each of said generated one or more classification models M(r) into a resultant classifier model, said resultant classifier model adapted to determine suitability of potential customers for receiving targeted products and promotions, wherein said resultant classifier model is based on a reduced amount of labeled data set instances with increased classification accuracy.
1 Assignment
0 Petitions
Accused Products
Abstract
A closed loop system is presented for selecting samples for labeling so that they can be used to generate classifiers. The sampling is done in phases. In each phase a subset of samples are chosen using information collected in previous phases and the classification model that has been generated up to that point. The total number of samples and the number of phases can be chosen by the user.
47 Citations
50 Claims
-
1. A method of using a computer for targeting products and promotions to candidate sets of customers having attributes, said method iteratively implementing phases with each phase comprising the steps of:
-
a) storing unlabeled customer data in a storage device, each unlabeled customer data having one or more customer attributes; b) implementing a means for selecting a subset of unlabeled customer data from said storage device, said selecting means responsive to received guessed labels generated for unlabeled customer data instances in said selected subset according to a first classification method and, further responsive to weights computed for unlabeled data instances using said guessed labels; c) implementing a means for labeling the selected subset of unlabeled customer data using external information and adding said labeled data subset to a labeled data set, said labeled data set comprising one or more labeled data instances; d) implementing a model generator device for retrieving said labeled data set and generating one or more classification models, said customer classification model generating comprising steps of; i) initializing an iteration index r; ii) initializing a first set of probabilities for each labeled instance in said labeled data set; iii) choosing a sample S(r) of labeled instances from the labeled data set using said probabilities; iv) generating a classification model M(r) for data in S(r) using said second classification method; v) applying said classification model M(r) to the entire labeled data set; vi) computing a second set of probabilities for including each instance; vii) incrementing said iteration index r; viii) repeating steps iii)–
vii) until a predetermined termination criterion is satisfied;f) applying one or more generated classification models M(r) and said guessed labels for unlabeled data instances to compute said weights in step b); and
utilizing said weights for selecting a next subset from remaining unlabeled data stored in said storage device in a subsequent phase; and
,g) repeating step b) through f) in each phase until a termination criterion is satisfied; and h) implementing a device for combining each of said generated one or more classification models M(r) into a resultant classifier model, said resultant classifier model adapted to determine suitability of potential customers for receiving targeted products and promotions, wherein said resultant classifier model is based on a reduced amount of labeled data set instances with increased classification accuracy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A closed-loop system for generating a classifier model to be used for marketing applications including the targeting of products and promotions to potential customers, each customer having one or more associated attributes, said closed-loop system comprising:
-
selector mechanism for iteratively selecting a subset of said customers to which a marketing application is to be targeted from an unlabeled data set, said mechanism sampling a subset of unlabeled data in a first iteration, and selecting further subsets of unlabeled data in subsequent iterations based on computed weights; mechanism for collecting responses from said customers related to said targeted marketing application at each iteration; a model generator for receiving collected responses, generating labeled data instances by labeling said customers according to their responses, and building one or more classification models using said one or more attributes for classifying other subsets of potential customers in each iteration based on said labeled data instances, said selector mechanism receiving said one or more classification models and said collected responses for a current iteration and computing a new set of weights for selecting the next subset for the next iteration, said step including utilizing said weights for selecting a next subset from remaining unlabeled data and said weights representing an importance of a subset to be selected in said next iteration, wherein said model generator comprises; a first classification device for generating guessed labels used for classifying unlabeled data instances in each iteration; a second classification device employing a decision tree classifier for generating an ensemble comprising said one or more classification models based on selected labeled data instances in each iteration, said selector mechanism further receiving said guessed labels from said first classification device and said ensemble of classification models output from said second classification device for computing a new set of weights; and
,a device for combining each one or more classification models generated at each iteration into a resultant classifier model after a predetermined condition is satisfied, wherein said resultant classifier model is applied to said potential customers to determine their suitability for receiving said products and promotions of said marketing application. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for iteratively generating a classifier model for classifying data with increased accuracy, said method steps iteratively implementing phases with each phase including the steps of:
-
a) retrieving a set of unlabeled data, each unlabeled data having one or more attributes; b) sampling a subset of unlabeled data from said retrieved set in a first iteration, and, selecting a subset from remaining unlabeled data of said retrieved unlabeled data set in each subsequent iteration; c) labeling the subset of data using external information and transferring said subset of labeled data to a set of labeled data, said labeled data comprising one or more labeled data instances; d) classifying the unlabeled data in said retrieved set by employing a first classification method to generate guessed labels; e) generating one or more classification models employing a second classification method, said second classification method performing steps of; i) initializing an iteration index r; ii) initializing a firs set of probabilities for each labeled instance in said labeled data set; iii) choosing a sample S(r) of labeled instances from the labeled data set using said probabilities; iv) generating a classification model M(r) for data in S(r) using said second classification method; v) applying said classification model M(r) to the entire labeled data set; vi) computing a second set of probabilities for including each instance; vii) imcrementing said iteration index r; viii) repeating steps iii)–
vii) until a predetermined termination criterion is satisfied;f) computing weights using said guessed labels and utilizing said weights for selecting a next subset from remaining unlabeled data in step b) in a next iteration; g) repeating steps b) through f) in each phase until a termination criterion is satisfied; and h) combining each of said generated one or more classification models into a resultant classifier model, wherein said resultant classifier model is based on a reduced amount of labeled data set instances with increased classification accuracy. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
-
Specification