Method and system for reconstructing original distributions from randomized numeric data
First Claim
1. A computer-implemented method for maintaining the privacy of a user of a user computer, comprising the acts of:
- at the user computer, perturbing original data to render perturbed data of original data;
receiving, at a Web server, perturbed data of original data associated with the user computer such that the original data is never sent to the Web server; and
reconstructing an estimate of a distribution of the original data using the perturbed data at the Web server.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for mining data while preserving a user'"'"'s privacy includes perturbing user-related information at the user'"'"'s computer and sending the perturbed data to a Web site. At the Web site, perturbed data from many users is aggregated, and from the distribution of the perturbed data, the distribution of the original data is reconstructed, although individual records cannot be reconstructed. Based on the reconstructed distribution, a decision tree classification model or a Naive Bayes classification model is developed, with the model then being provided back to the users, who can use the model on their individual data to generate classifications that are then sent back to the Web site such that the Web site can display a page appropriately configured for the user'"'"'s classification. Or, the classification model need not be provided to users, but the Web site can use the model to, e.g., send search results and a ranking model to a user, with the ranking model being used at the user computer to rank the search results based on the user'"'"'s individual classification data.
-
Citations
18 Claims
-
1. A computer-implemented method for maintaining the privacy of a user of a user computer, comprising the acts of:
-
at the user computer, perturbing original data to render perturbed data of original data;
receiving, at a Web server, perturbed data of original data associated with the user computer such that the original data is never sent to the Web server; and
reconstructing an estimate of a distribution of the original data using the perturbed data at the Web server. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
partitioning the perturbed data into intervals; and
iteratively determining a partition probability using the intervals.
-
-
8. The method of claim 1, further comprising using the estimate to generate at least one data mining model.
-
9. A computer system including at least one program of instructions including structure to undertake method acts comprising:
-
at a user computer, randomizing at least some original values of at least some numeric attributes to render perturbed values;
sending only the perturbed values to a server computer via the Web; and
at the server computer, reconstructing an estimate of a distribution of the original data using the perturbed data, wherein the server computer cannot access the original values. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A system storage device including system readable code readable by a server system, comprising:
-
logic means for randomizing at least some original values of at least some numeric attributes to render perturbed values at a user computer; and
logic means for sending only the perturbed values to a server computer via the Internet, whereby the server computer cannot access the original values but uses the perturbed values to reconstruct an estimate of a distribution of the original data using the perturbed data. - View Dependent Claims (15, 16, 17, 18)
-
Specification