DIFFERENTIALLY PRIVATE LINEAR QUERIES ON HISTOGRAMS
First Claim
1. A computer system comprising:
- one or more processors; and
one or more computer-readable hardware storage media having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to anonymize data by imposing differential privacy constraints on the data by causing the computer system to;
receive a query directed to a dataset that includes confidential information, wherein the received query is received from a source that is not authorized to directly view the confidential information;
execute the received query against the dataset to obtain an answer to the received query, wherein the answer includes at least some of the confidential information;
after obtaining the answer and before returning the answer to the source, apply a privacy error to the at least some confidential information included in the answer, wherein determining an amount of the privacy error to apply to the at least some confidential information is based on a sparsity of entries included in the dataset, and such that the privacy error is reduced when querying relatively smaller sized datasets than relatively larger datasets; and
after applying the privacy error to the at least some confidential information included in the answer, return the answer to the source.
2 Assignments
0 Petitions
Accused Products
Abstract
The privacy of linear queries on histograms is protected. A database containing private data is queried. Base decomposition is performed to recursively compute an orthonormal basis for the database space. Using correlated (or Gaussian) noise and/or least squares estimation, an answer having differential privacy is generated and provided in response to the query. In some implementations, the differential privacy is ε-differential privacy (pure differential privacy) or is (ε,δ)-differential privacy (i.e., approximate differential privacy). In some implementations, the data in the database may be dense. Such implementations may use correlated noise without using least squares estimation. In other implementations, the data in the database may be sparse. Such implementations may use least squares estimation with or without using correlated noise.
8 Citations
20 Claims
-
1. A computer system comprising:
-
one or more processors; and one or more computer-readable hardware storage media having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to anonymize data by imposing differential privacy constraints on the data by causing the computer system to; receive a query directed to a dataset that includes confidential information, wherein the received query is received from a source that is not authorized to directly view the confidential information; execute the received query against the dataset to obtain an answer to the received query, wherein the answer includes at least some of the confidential information; after obtaining the answer and before returning the answer to the source, apply a privacy error to the at least some confidential information included in the answer, wherein determining an amount of the privacy error to apply to the at least some confidential information is based on a sparsity of entries included in the dataset, and such that the privacy error is reduced when querying relatively smaller sized datasets than relatively larger datasets; and after applying the privacy error to the at least some confidential information included in the answer, return the answer to the source. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for anonymizing data by imposing differential privacy constraints on the data, the method being implemented by one or more processors of a computer system and comprising:
-
receiving a query directed to a dataset that includes confidential information, wherein the received query is received from a source that is not authorized to directly view the confidential information; executing the query against the dataset to obtain an answer to the received query, wherein the answer includes at least some of the confidential information; after obtaining the answer and before returning the answer to the source, applying a privacy error to the at least some confidential information included in the answer, wherein determining an amount of the privacy error to apply to the at least some confidential information is based on a function of a number of entries that are included in the dataset, and by causing the amount of privacy error to be reduced when querying relatively smaller sized datasets than larger datasets; and after applying the privacy error to the at least some confidential information included in the answer, returning the answer to the source. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. One or more hardware storage devices having stored thereon computer-executable instructions that are executable by one or more processors of a computer system to cause the computer system to anonymize data by imposing differential privacy constraints on the data by causing the computer system to:
-
receive a query directed to a dataset that includes confidential information, wherein the received query is received from a source that is not authorized to directly view the confidential information; execute the received query against the dataset to obtain an answer to the received query, wherein the answer includes at least some of the confidential information; after obtaining the answer and before returning the answer to the source, applying a privacy error to the at least some confidential information included in the answer, wherein determining an amount of the privacy error to apply to the at least some confidential information is based on a function of a number of entries that are included in the dataset, and by causing the amount of privacy error to be reduced when querying relatively smaller sized datasets than larger datasets; and after applying the privacy error to the at least some confidential information included in the answer, return the answer to the source.
-
- 16. The one or more hardware storage devices of claim 16, wherein the received query is a counting query.
Specification