Differentially private machine learning using a random forest classifier
First Claim
1. A method, comprising:
- receiving a request from a client to generate a differentially private random forest classifier trained using a set of restricted data stored by a private database system, the request identifying a level of differential privacy corresponding to the request, the identified level of differential privacy comprising privacy parameters ε and
δ
, wherein ε
describes a degree of information released about the set of restricted data due to the request and δ
describes an improbability of the request satisfying (ε
)-differential privacy;
generating the differentially private random forest classifier in response to the request, generating the classifier comprising;
determining a number of decision trees comprising the differentially private random forest classifier;
generating the determined number of decision trees, wherein a decision tree comprises a plurality of leaf nodes representing classification categories, and generating the decision tree comprises;
generating a set of splits based on features of the set of restricted data;
determining an information gain for each split of the set of splits;
selecting a split from the set of splits using an exponential mechanism based at least in part on the determined information gains of the splits in the set and at least one of the privacy parameters;
adding the selected split to the decision tree at a node; and
determining, for a certain leaf node of the plurality of leaf nodes representing a certain classification category of the classification categories, a differentially private count of entities in the set of restricted data in the certain classification category; and
providing the differentially private random forest classifier to the client, the provided differentially private random forest classifier comprising the differentially private count of entities in the certain classification category represented by the certain leaf node.
2 Assignments
0 Petitions
Accused Products
Abstract
A request from a client is received to generate a differentially private random forest classifier trained using a set of restricted data. The differentially private random forest classifier is generated in response to the request. Generating the differentially private random forest classifier includes determining a number of decision trees and generating the determined number of decision trees. Generating a decision tree includes generating a set of splits based on the restricted data, determining an information gain for each split, selecting a split from the set using an exponential mechanism, and adding the split to the decision tree. The differentially private random forest classifier is provided to the client.
200 Citations
17 Claims
-
1. A method, comprising:
-
receiving a request from a client to generate a differentially private random forest classifier trained using a set of restricted data stored by a private database system, the request identifying a level of differential privacy corresponding to the request, the identified level of differential privacy comprising privacy parameters ε and
δ
, wherein ε
describes a degree of information released about the set of restricted data due to the request and δ
describes an improbability of the request satisfying (ε
)-differential privacy;generating the differentially private random forest classifier in response to the request, generating the classifier comprising; determining a number of decision trees comprising the differentially private random forest classifier; generating the determined number of decision trees, wherein a decision tree comprises a plurality of leaf nodes representing classification categories, and generating the decision tree comprises; generating a set of splits based on features of the set of restricted data; determining an information gain for each split of the set of splits; selecting a split from the set of splits using an exponential mechanism based at least in part on the determined information gains of the splits in the set and at least one of the privacy parameters; adding the selected split to the decision tree at a node; and determining, for a certain leaf node of the plurality of leaf nodes representing a certain classification category of the classification categories, a differentially private count of entities in the set of restricted data in the certain classification category; and providing the differentially private random forest classifier to the client, the provided differentially private random forest classifier comprising the differentially private count of entities in the certain classification category represented by the certain leaf node. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable storage medium storing computer program instructions executable by a processor to perform operations, the operations comprising:
-
receiving a request from a client to generate a differentially private random forest classifier trained using a set of restricted data stored by a private database system, the request identifying a level of differential privacy corresponding to the request, the identified level of differential privacy comprising privacy parameters ε and
δ
, wherein ε
describes a degree of information released about the set of restricted data due to the request and δ
describes an improbability of the request satisfying (ε
)-differential privacy;generating the differentially private random forest classifier in response to the request, generating the classifier comprising; determining a number of decision trees comprising the differentially private random forest classifier; generating the determined number of decision trees, wherein a decision tree comprises a plurality of leaf nodes representing classification categories, and generating the decision tree comprises; generating a set of splits based on features of the set of restricted data; determining an information gain for each split of the set of splits; selecting a split from the set of splits using an exponential mechanism based at least in part on the determined information gains of the splits in the set and at least one of the privacy parameters; adding the selected split to the decision tree at a node; and determining, for a certain leaf node of the plurality of leaf nodes representing a certain classification category of the classification categories, a differentially private count of entities in the set of restricted data in the certain classification category; and providing the differentially private random forest classifier to the client, the provided differentially private random forest classifier comprising the differentially private count of entities in the certain classification category represented by the certain leaf node. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
a processor for executing computer program instructions; and a non-transitory computer-readable storage medium storing computer program instructions executable by the processor to perform operations comprising; receiving a request from a client to generate a differentially private random forest classifier trained using a set of restricted data stored by a private database system, the request identifying a level of differential privacy corresponding to the request, the identified level of differential privacy comprising privacy parameters ε and
δ
, wherein ε
describes a degree of information released about the set of restricted data due to the request and δ
describes an improbability of the request satisfying (ε
)-differential privacy;generating the differentially private random forest classifier in response to the request, generating the classifier comprising; determining a number of decision trees comprising the differentially private random forest classifier; generating the determined number of decision trees, wherein a decision tree comprises a plurality of leaf nodes representing classification categories, and generating the decision tree comprises; generating a set of splits based on features of the set of restricted data; determining an information gain for each split of the set of splits; selecting a split from the set of splits using an exponential mechanism based at least in part on the determined information gains of the splits in the set and at least one of the privacy parameters; adding the selected split to the decision tree at a node; and determining, for a certain leaf node of the plurality of leaf nodes representing a certain classification category of the classification categories, a differentially private count of entities in the set of restricted data in the certain classification category; and providing the differentially private random forest classifier to the client the provided differentially private random forest classifier comprising the differentially private count of entities in the certain classification category represented by the certain leaf node. - View Dependent Claims (14, 15, 16, 17)
-
Specification