Data anonymization based on guessing anonymity
First Claim
1. A method comprising:
- invoking, by a processing device, an anonymization function provided via an application interface,the anonymization function being invoked through an activation of a user input mechanism of the application interface;
receiving, by the processing device and based on invoking the anonymization function, a selection of a noise model to be applied to data;
determining, by the processing device, a first anonymization parameter,the first anonymization parameter comprising an expected distortion;
determining, by the processing device, a particular level for the first anonymization parameter,the determined particular level being a maximum expected distortion level;
determining, by the processing device, a second anonymization parameter based on the first anonymization parameter,the second anonymization parameter being an expected guessing anonymity,the expected guessing anonymity being a number of guesses needed to determine an original record from a sanitized record;
determining, by the processing device and based on determining the particular level for the first anonymization parameter, a parameter value for the noise model that optimizes the second anonymization parameter;
generating, by the processing device, noise based on the noise model and the determined parameter value;
applying, by the processing device, the generated noise to the data;
generating, by the processing device, noise perturbed data based on applying the generated noise to the data,the noise perturbed data being associated with;
the determined particular level with respect to the expected distortion, andthe determined parameter value with respect to the expected guessing anonymity; and
providing, by the processing device, the noise perturbed data to a destination.
0 Assignments
0 Petitions
Accused Products
Abstract
Privacy is defined in the context of a guessing game based on the so-called guessing inequality. The privacy of a sanitized record, i.e., guessing anonymity, is defined by the number of guesses an attacker needs to correctly guess an original record used to generate a sanitized record. Using this definition, optimization problems are formulated that optimize a second anonymization parameter (privacy or data distortion) given constraints on a first anonymization parameter (data distortion or privacy, respectively). Optimization is performed across a spectrum of possible values for at least one noise parameter within a noise model. Noise is then generated based on the noise parameter value(s) and applied to the data, which may comprise real and/or categorical data. Prior to anonymization, the data may have identifiers suppressed, whereas outlier data values in the noise perturbed data may be likewise modified to further ensure privacy.
64 Citations
20 Claims
-
1. A method comprising:
-
invoking, by a processing device, an anonymization function provided via an application interface, the anonymization function being invoked through an activation of a user input mechanism of the application interface; receiving, by the processing device and based on invoking the anonymization function, a selection of a noise model to be applied to data; determining, by the processing device, a first anonymization parameter, the first anonymization parameter comprising an expected distortion; determining, by the processing device, a particular level for the first anonymization parameter, the determined particular level being a maximum expected distortion level; determining, by the processing device, a second anonymization parameter based on the first anonymization parameter, the second anonymization parameter being an expected guessing anonymity, the expected guessing anonymity being a number of guesses needed to determine an original record from a sanitized record; determining, by the processing device and based on determining the particular level for the first anonymization parameter, a parameter value for the noise model that optimizes the second anonymization parameter; generating, by the processing device, noise based on the noise model and the determined parameter value; applying, by the processing device, the generated noise to the data; generating, by the processing device, noise perturbed data based on applying the generated noise to the data, the noise perturbed data being associated with; the determined particular level with respect to the expected distortion, and the determined parameter value with respect to the expected guessing anonymity; and providing, by the processing device, the noise perturbed data to a destination. - View Dependent Claims (2, 3, 4, 5, 18)
-
-
6. An apparatus comprising:
-
a memory to store instructions; and a processor to execute the instructions to; invoke an anonymization function provided via an application interface, the anonymization function being invoked through an activation of a user input mechanism of the application interface; receive, based on invoking the anonymization function, a selection of a noise model to be applied to data; determine a first anonymization parameter, the first anonymization parameter comprising an expected distortion; determine a particular level for the first anonymization parameter; the determined particular level being a maximum expected distortion level; determine a second anonymization parameter based on the first anonymization parameter, the second anonymization parameter being an expected guessing anonymity, the expected guessing anonymity being a number of guesses needed to determine an original record from a sanitized record; determine, based on determining the particular level for the first anonymization parameter, a parameter value for the noise model that optimizes the second anonymization parameter; generate noise based on the noise model and the determined parameter value; apply the generated noise to the data; generate noise perturbed data based on applying the generated noise to the data, the noise perturbed data having an anonymized structure, and the noise perturbed data being associated with; the determined particular level with respect to the expected distortion, and the determined parameter value with respect to the expected guessing anonymity; and provide the noise perturbed data to a destination. - View Dependent Claims (7, 8, 9, 10, 11, 17, 19)
-
-
12. A non-transitory computer readable medium storing instructions, the instructions comprising:
one or more instructions which, when executed by a processor, cause the processor to; invoke an anonymization function provided via an application interface, the anonymization function being invoked through an activation of a user input mechanism of the application interface; receive, based on invoking the anonymization function a selection of a noise model to be applied to data; determine a first anonymization parameter; the first anonymization parameter comprising an expected distortion; determine a particular level for the first anonymization parameter; the determined particular level being a maximum expected distortion level; determine a second anonymization parameter based on the first anonymization parameter, the second anonymization parameter being an expected guessing anonymity, the expected guessing anonymity being a number of guesses needed to determine an original record from a sanitized record; determine, based on determining the particular level for the first anonymization parameter, a parameter value for the noise model that optimizes the second anonymization parameter; generate noise based on the noise model and the determined parameter value; apply the generated noise to the data; generate noise perturbed data based on applying the generated noise to the data, the noise perturbed data being associated with; the determined particular level with respect to the expected distortion, and the determined parameter value with respect to the expected guessing anonymity; and provide the noise perturbed data to a destination. - View Dependent Claims (13, 14, 15, 16, 20)
Specification