Learning new words
First Claim
1. A computer-implemented method, comprising:
- receiving, by a term learning server, a batch of differentially private sketches of n-grams, each n-gram a sequence of characters forming a subset of one term in a plurality of terms unknown to the term learning server;
for each received differentially private n-gram sketch, determining a matching differentially private n-gram sketch to the received sketch, wherein the matching differentially private n-gram sketch, W, has k rows and m columns, wherein each row ∈
[k] corresponds to a hash function h in the set of k hash functions, H={h1, . . . , hk};
adding the received differentially private n-gram sketch data to the matching differentially private sketch n-gram sketch;
determining a frequency of each matching differentially private n-gram sketch among the batch;
selecting the matching differentially private n-grams having a frequency greater than a threshold value;
generating a plurality of combinations of differentially private n-grams from the selected matching differentially private sketches of n-grams having a frequency greater than a threshold value;
determining one or more new terms using the plurality of combinations of differentially private n-grams; and
adding at least one of the one or more new terms to an asset catalog to form an updated asset catalog.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed for a server learning new words generated by user client devices in a crowdsourced manner while maintaining local differential privacy of client devices. A client device can determine that a word typed on the client device is a new word that is not contained in a dictionary or asset catalog on the client device. New words can be grouped in classifications such as entertainment, health, finance, etc. A differential privacy system on the client device can comprise a privacy budget for each classification of new words. If there is privacy budget available for the classification, then one or more new terms in a classification can be sent to new term learning server, and the privacy budget for the classification reduced. The privacy budget can be periodically replenished.
-
Citations
15 Claims
-
1. A computer-implemented method, comprising:
-
receiving, by a term learning server, a batch of differentially private sketches of n-grams, each n-gram a sequence of characters forming a subset of one term in a plurality of terms unknown to the term learning server; for each received differentially private n-gram sketch, determining a matching differentially private n-gram sketch to the received sketch, wherein the matching differentially private n-gram sketch, W, has k rows and m columns, wherein each row ∈
[k] corresponds to a hash function h in the set of k hash functions, H={h1, . . . , hk};adding the received differentially private n-gram sketch data to the matching differentially private sketch n-gram sketch;
determining a frequency of each matching differentially private n-gram sketch among the batch;selecting the matching differentially private n-grams having a frequency greater than a threshold value;
generating a plurality of combinations of differentially private n-grams from the selected matching differentially private sketches of n-grams having a frequency greater than a threshold value;determining one or more new terms using the plurality of combinations of differentially private n-grams; and adding at least one of the one or more new terms to an asset catalog to form an updated asset catalog. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system comprising:
a processing system coupled to a memory programmed with executable instructions that, when executed by the processing system perform operations, comprising; receiving, by a term learning server, a batch of differentially private sketches of n-grams, each n-gram a sequence of characters forming a subset of one term in a plurality of terms unknown to the term learning server;
for each received differentially private n-gram sketch, determining a matching differentially private n-gram sketch to the received sketch, wherein the matching differentially private n-gram sketch, W, has k rows and m columns, wherein each row ∈
[k] corresponds to a hash function h in the set of k hash functions, H={h1, . . . , hk};adding the received differentially private n-gram sketch data to the matching differentially private sketch n-gram sketch; determining a frequency of each matching differentially private n-gram sketch among the batch; selecting the matching differentially private n-grams having a frequency greater than a threshold value; generating a plurality of combinations of differentially private n-grams from the selected matching differentially private sketches of n-grams having a frequency greater than a threshold value; determining one or more new terms using the plurality of combinations of differentially private n-grams; and adding at least one of the one or more new terms to an asset catalog to form an updated asset catalog. - View Dependent Claims (7, 8, 9, 10)
-
11. A non-transitory machine-readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
receiving, by a term learning server, a batch of differentially private sketches of n-grams, each n-gram a sequence of characters forming a subset of one term in a plurality of terms unknown to the term learning server; for each received differentially private n-gram sketch, determining a matching differentially private n-gram sketch to the received sketch, wherein the matching differentially private n-gram sketch, W, has k rows and m columns, wherein each row ∈
[k] corresponds to a hash function h in a set of k hash functions, H={h1, . . . , hk};adding the received differentially private n-gram sketch data to the matching differentially private sketch n-gram sketch;
determining a frequency of each matching differentially private n-gram sketch among the batch;selecting the matching differentially private n-grams having a frequency greater than a threshold value;
generating a plurality of combinations of differentially private n-grams from the selected matching differentially private sketches of n-grams having a frequency greater than a threshold value;determining one or more new terms using the plurality of combinations of differentially private n-grams; and adding at least one of the one or more new terms to an asset catalog to form an updated asset catalog. - View Dependent Claims (12, 13, 14, 15)
-
Specification