DIFFERENTIALLY PRIVATE DATA RELEASE
First Claim
1. A method comprising:
- receiving a list of strings at a computing device through a network, wherein the list of strings comprises a plurality of strings and a count associated with each string;
selecting a threshold count by the computing device;
determining a distribution of a number of times that noise generated by a noise generating function would exceed the threshold count by the computing device;
generating a number of strings according to the determined distribution by the computing device;
generating a count for each generated string by the computing device, wherein the generated count is greater than the threshold count; and
adding the generated strings and generated counts to the list of strings by the computing device.
2 Assignments
0 Petitions
Accused Products
Abstract
A query log includes a list of queries and a count for each query representing the number of times that the query was received by a search engine. In order to provide differential privacy protection to the queries, noise is generated and added to each count, and queries that have counts that fall below a threshold are removed from the query log. A distribution associated with a function used to generate the noise is referenced to determine a distribution of a number of times that a hypothetical query having a zero count would have its count exceed the threshold after the addition of noise. Random queries of an amount equal to a sample from the distribution of number of times are added to the query log with a count that is greater than the threshold count.
63 Citations
20 Claims
-
1. A method comprising:
-
receiving a list of strings at a computing device through a network, wherein the list of strings comprises a plurality of strings and a count associated with each string; selecting a threshold count by the computing device; determining a distribution of a number of times that noise generated by a noise generating function would exceed the threshold count by the computing device; generating a number of strings according to the determined distribution by the computing device; generating a count for each generated string by the computing device, wherein the generated count is greater than the threshold count; and adding the generated strings and generated counts to the list of strings by the computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method comprising:
-
receiving a list of strings by a computing device, wherein the list of strings comprises a plurality of strings and a count associated with each string and each string further has an associated length; selecting a first length by the computing device; determining a first threshold count based on the first length by the computing device; for each string in the list of strings with an associated length equal to the first length; generating noise using a noise generating function by the computing device; and adding the generated noise to the count associated with the string by the computing device; removing strings from the list of strings of a length equal to the first length with a count that is less than the first threshold count by the computing device; determining a distribution of a number of times that the generated noise by the noise generating function would exceed the first threshold count by the computing device; generating a number of strings of length equal to the first length according to the determined distribution by the computing device; generating a count for each generated string by the computing device, wherein the generated count for each string is greater than the first threshold count; and adding the generated strings and generated counts to the list of strings by the computing device. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
a search engine that generates a query log, wherein the query log comprises a plurality of queries and a count associated with each query; a query privacy preservation platform that; receives the query log; selects a threshold count; determines a distribution of a number of times that noise generated by a noise generating function would exceed the threshold count; generates a number of queries according to the distribution; generates a count for each generated query, wherein each generated count is greater than the threshold count; and adds the generated queries and generated counts to the query log. - View Dependent Claims (18, 19, 20)
-
Specification