Method for statistical disclosure limitation
First Claim
1. A method of preserving confidentiality and analytical utility of an original database comprising a plurality of records, comprising:
- partitioning the plurality of records into a plurality of risk strata based on a plurality of identifying variables, wherein each risk stratum includes at least one record; and
determining a respective rate of unique occurrence for each risk stratum in the plurality of risk strata.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for ensuring statistical disclosure limitation (SDL) of categorical or continuous micro data, while maintaining the analytical quality of the micro data. The new SDL methodology exploits the analogy between (1) taking a sample (instead of a census,) along with some adjustments, including imputation, for missing information, and (2) releasing a subset, instead of the original data set, along with some adjustments for records still at disclosure risk. Survey sampling reduces monetary cost in comparison to a census, but entails some loss of information. Similarly, releasing a subset reduces disclosure cost in comparison to the full database, but entails some loss of information. Thus, optimal survey sampling methods can be used for statistical disclosure limitation. The method includes partitioning the database into risk strata, optimal probabilistic substitution, optimal probabilistic subsampling, and optimal sampling weight calibration.
-
Citations
23 Claims
-
1. A method of preserving confidentiality and analytical utility of an original database comprising a plurality of records, comprising:
-
partitioning the plurality of records into a plurality of risk strata based on a plurality of identifying variables, wherein each risk stratum includes at least one record; and
determining a respective rate of unique occurrence for each risk stratum in the plurality of risk strata. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 22, 23)
-
-
18. A method of substituting at least one data value in at least one record in a database comprising a plurality of records, comprising:
-
selecting a partner record for each record in the plurality of records; and
partitioning the plurality of records into a plurality of risk strata based on a plurality of identifying variables.
-
-
19. The method of 18, further comprising:
-
determining a respective substitution probability for each risk stratum in the plurality of risk strata by minimizing a disclosure loss function subject to a bias constraint; and
replacing data associated with at least one of the plurality of identifying variables in each record in a sample of records selected from the plurality of records, wherein (1) the sample of records is chosen based on the respective substitution probabilities, and (2) the replaced data is obtained from the corresponding partner record.
-
-
20. A method of selecting a subsample of records from a database comprising a plurality of records, comprising:
-
partitioning the plurality of records into a plurality of risk strata based on a plurality of identifying variables; and
determining a respective subsampling probability for each risk stratum in the plurality of risk strata by minimizing a disclosure loss function subject to a variance constraint.
-
-
21. The method of 20, further comprising:
selecting, from the plurality of records, the subsample of records based on the respective subsampling probabilities and the plurality of risk strata.
Specification