Method and apparatus for privacy preserving data mining by restricting attribute choice
First Claim
1. A method of generating at least one output data set from at least one input data set for use in association with a data mining process, the input data set comprising at least one entry including each of a plurality of attributes, comprising the steps of:
- determining at least one relevance coefficient for at least a subset of the plurality of attributes;
selecting at least one relevant attribute of the at least one input data set based at least in part on the at least one relevance coefficient; and
generating the at least one output data set from the at least one input data set;
wherein the at least one output data set comprises at least one entry not including at least one of the plurality of attributes; and
wherein the at least one entry of the output data set has the at least one relevant attribute of the at least one input data set;
wherein the at least one relevance coefficient is computed using a quantitative measure of an effect on the data mining process of a deletion of at least the given attribute from each entry of the input data set.
2 Assignments
0 Petitions
Accused Products
Abstract
Improved techniques for privacy preserving data mining of multidimensional data records are disclosed. For example, a technique for generating at least one output data set from at least one input data set for use in association with a data mining process comprises the following steps/operations. At least one relevant attribute of the at least one input data set is selected through determination of at least one relevance coefficient. The at least one output data set is generated from the at least one input data set, wherein the at least one output data set comprises the at least one relevant attribute of the at least one input data set, as determined by use of the at least one relevance coefficient.
-
Citations
21 Claims
-
1. A method of generating at least one output data set from at least one input data set for use in association with a data mining process, the input data set comprising at least one entry including each of a plurality of attributes, comprising the steps of:
-
determining at least one relevance coefficient for at least a subset of the plurality of attributes; selecting at least one relevant attribute of the at least one input data set based at least in part on the at least one relevance coefficient; and generating the at least one output data set from the at least one input data set; wherein the at least one output data set comprises at least one entry not including at least one of the plurality of attributes; and wherein the at least one entry of the output data set has the at least one relevant attribute of the at least one input data set; wherein the at least one relevance coefficient is computed using a quantitative measure of an effect on the data mining process of a deletion of at least the given attribute from each entry of the input data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. Apparatus for generating at least one output data set from at least one input data set for use in association with a data mining process, the input data set comprising at least one entry including each of a plurality of attributes, the apparatus comprising:
-
a memory; and at least one processor coupled to the memory and operative to;
(i) determine at least one relevance coefficient for at least a subset of the plurality of attributes;
(ii) select at least one relevant attribute of the at least one input data set based at least in part on the at least one relevance coefficient; and
(iii) generate the at least one output data set from the at least one input data set;wherein the at least one output data set comprises at least one entry not including at least one of the plurality of attributes; and wherein the at least one entry of the output data set includes the at least one relevant attribute of the at least one input data set; wherein the at least one relevance coefficient is computed using a quantitative measure of an effect on the data mining process of a deletion of at least the given attribute from each entry of the input data set. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. An article of manufacture for generating at least one output data set from at least one input data set for use in association with a data mining process, the input data set comprising at least one entry including each of a plurality of attributes, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
determining at least one relevance coefficient for at least a subset of the plurality of attributes; selecting at least one relevant attribute of the at least one input data set based at least in part on the at least one relevance coefficient; and generating the at least one output data set from the at least one input data set; wherein the at least one output data set comprises at least one entry not including at least one of the plurality of attributes; and wherein the at least one entry of the output data set includes the at least one relevant attribute of the at least one input data set; wherein the at least one relevance coefficient is computed using a quantitative measure of an effect on the data mining process of a deletion of at least the given attribute from each entry of the input data set.
-
Specification