Systems and methods for calibrating user and consumer data
First Claim
1. A method in a computing system for calibrating a subject data set based on information from a reference data set, each data set containing a plurality of participants and associated transactional data, the method comprising:
- using a data partitioning scheme, partitioning the reference data set into a plurality of reference data partitions, each of the plurality of reference data partitions having an associated transactional characteristic and no two reference data partitions sharing a participant in common;
using the data partitioning scheme, partitioning the subject data set into a plurality of subject data partitions, wherein;
each of the plurality of subject data partitions has an associated transactional characteristic that is the same as the transactional characteristic associated with the corresponding reference data partition or has a high degree of correspondence with the transactional characteristic associated with the corresponding reference data partition; and
no two subject data partitions of the plurality of subject data partitions share a participant in common;
calculating weights associated with each of the plurality of subject data partitions to adjust for subject data partitions that are under- or over-represented, the weights calculated to adjust a distribution of the plurality of subject data partitions to be the same as a distribution of the plurality of reference data set partitions;
calculating a statistic for each of the plurality of subject data partitions; and
adjusting, by the computing system, the calculated statistics by applying the calculated weight for each subject data partition to the calculated statistic for each subject data partition, the applied weights producing calibrated estimates of the statistics for the plurality of subject data partitions.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method that calibrates subject data for which a relationship to a target population is not known, so that the calibrated subject data can more accurately represent the target population. In many cases the calibration will involve the use of a differential weighting scheme applied to the data at the constituent level. The system and method allows the values of the observed variables in the subject data set to be weighted so that their incidence is equivalent to that of a reference population represented by a reference data set, even if the variables used in the reference data set to make estimates for the reference population were not collected or measured for the subject data set.
-
Citations
30 Claims
-
1. A method in a computing system for calibrating a subject data set based on information from a reference data set, each data set containing a plurality of participants and associated transactional data, the method comprising:
-
using a data partitioning scheme, partitioning the reference data set into a plurality of reference data partitions, each of the plurality of reference data partitions having an associated transactional characteristic and no two reference data partitions sharing a participant in common; using the data partitioning scheme, partitioning the subject data set into a plurality of subject data partitions, wherein; each of the plurality of subject data partitions has an associated transactional characteristic that is the same as the transactional characteristic associated with the corresponding reference data partition or has a high degree of correspondence with the transactional characteristic associated with the corresponding reference data partition; and no two subject data partitions of the plurality of subject data partitions share a participant in common; calculating weights associated with each of the plurality of subject data partitions to adjust for subject data partitions that are under- or over-represented, the weights calculated to adjust a distribution of the plurality of subject data partitions to be the same as a distribution of the plurality of reference data set partitions; calculating a statistic for each of the plurality of subject data partitions; and adjusting, by the computing system, the calculated statistics by applying the calculated weight for each subject data partition to the calculated statistic for each subject data partition, the applied weights producing calibrated estimates of the statistics for the plurality of subject data partitions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium encoded with instructions that, when executed by a processor, perform a method in a computing system for calibrating a subject data set based on information from a reference data set, each data set containing a plurality of participants and associated transactional data, the method comprising:
-
using a data partitioning scheme, partitioning the reference data set into a plurality of reference data partitions, each of the plurality of reference data partitions having an associated transactional characteristic and no two reference data partitions sharing a participant in common; using the data partitioning scheme, partitioning the subject data set into a plurality of subject data partitions, wherein; each of the plurality of subject data partitions has an associated transactional characteristic that is the same as the transactional characteristic associated with the corresponding reference data partition or has a high degree of correspondence with the transactional characteristic associated with the corresponding reference data partition; and no two subject data partitions of the plurality of subject data partitions share a participant in common; calculating weights associated with each of the plurality of subject data partitions to adjust for subject data partitions that are under- or over-represented, the weights calculated to adjust a distribution of the plurality of subject data partitions to be the same as a distribution of the plurality of reference data set partitions; calculating a statistic for each of the plurality of subject data partitions; and adjusting, by the computing system, the calculated statistics by applying the calculated weight for each subject data partition to the calculated statistic for each subject data partition, the applied weights producing calibrated estimates of the statistics for the plurality of subject data partitions. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
-
22. A method in a computing system for calibrating a subject data set based on information from a reference data set, each data set containing a plurality of participants, the method comprising:
-
using a data partitioning scheme, partitioning the reference data set into a plurality of reference data partitions, each of the plurality of reference data partitions having one or more variables; using the data partitioning scheme, partitioning the subject data set into a plurality of subject data partitions, wherein; each of the plurality of subject data partitions has one or more variables that are the same as the one or more variables associated with the corresponding reference data partition or have a high degree of correspondence to the one or more variables associated with the corresponding reference data partition; calculating weights associated with each of the plurality of subject data partitions to adjust for subject data partitions that are under- or over-represented with respect to the distribution of the reference data partitions, the weights calculated to adjust a distribution of the plurality of subject data partitions to be the same as a distribution of the plurality of reference data set partitions; calculating a statistic for each of the plurality of subject data partitions; and adjusting, by the computing system, the calculated statistics by applying the calculated weight for each subject data partition to the calculated statistic for each subject data partition, the applied weights producing calibrated estimates of the statistics for the plurality of subject data partitions. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30)
-
Specification