Generation Method and Device for generating anonymous dataset, and method and device for risk evaluation
First Claim
1. An anonymous dataset generation method, comprising:
- acquiring a critical attribute set and a quasi-identifier set, wherein the critical attribute set comprises at least one critical attribute, the quasi-identifier set comprises a plurality of quasi-identifiers, and one of the at least one critical attribute or one of the quasi-identifiers is set as an anchor attribute;
generating an equivalence table according to the quasi-identifier set, the critical attribute set and an original dataset, wherein the equivalence table comprises a plurality of equivalence classes, each of the equivalence classes comprises at least one equivalence data, and each equivalence data comprises a plurality of original values corresponding to the quasi-identifiers respectively;
generating a plurality of data clusters of a cluster table sequentially according to the equivalence table, wherein each of the data clusters comprises at least one of the equivalence classes; and
generalizing content of the cluster table to generate and output an anonymous dataset corresponding to the original dataset, wherein the original values corresponding to the anchor attribute are maintained originally in the anonymous dataset.
1 Assignment
0 Petitions
Accused Products
Abstract
An anonymous dataset generation method comprises following steps. A critical attribute set and a quasi-identifier (QID) set are acquired, and one of the critical attribute and the quasi-identifier is set as an anchor attribute. An attribute sequence and an equivalence table are generated according to the quasi-identifier set and the critical attribute set. A data cluster and a cluster table are generated according to the equivalence table. The content of the cluster table is generalized to generate and output an anonymous dataset corresponding to an original dataset. A risk evaluation method for an anonymous dataset calculates data weight to extract distinctive data and to attacking defects of the anonymous dataset according to the distinctive data, thereby enhancing a risk evaluation efficiency of the anonymous dataset.
-
Citations
48 Claims
-
1. An anonymous dataset generation method, comprising:
-
acquiring a critical attribute set and a quasi-identifier set, wherein the critical attribute set comprises at least one critical attribute, the quasi-identifier set comprises a plurality of quasi-identifiers, and one of the at least one critical attribute or one of the quasi-identifiers is set as an anchor attribute; generating an equivalence table according to the quasi-identifier set, the critical attribute set and an original dataset, wherein the equivalence table comprises a plurality of equivalence classes, each of the equivalence classes comprises at least one equivalence data, and each equivalence data comprises a plurality of original values corresponding to the quasi-identifiers respectively; generating a plurality of data clusters of a cluster table sequentially according to the equivalence table, wherein each of the data clusters comprises at least one of the equivalence classes; and generalizing content of the cluster table to generate and output an anonymous dataset corresponding to the original dataset, wherein the original values corresponding to the anchor attribute are maintained originally in the anonymous dataset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An anonymous dataset generation device, comprising:
-
a memory, for storing data or storing data temporarily; and a processor, coupled to the memory, and comprising; an equivalence generation module, for performing steps of; acquiring a critical attribute set and a quasi-identifier set, wherein the critical attribute set comprises at least one critical attribute, the quasi-identifier set comprises a plurality of quasi-identifiers, and one of the at least one critical attribute or one of the quasi-identifiers is set as an anchor attribute; and generating an equivalence table according to the quasi-identifier set, the critical attribute set and an original dataset, wherein the equivalence table comprises a plurality of equivalence classes, each of the equivalence classes comprises at least one equivalence data, and each equivalence data comprises a plurality of original values corresponding to the quasi-identifiers respectively; a cluster generation module, for generating a plurality of data clusters of a cluster table according to the equivalence table sequentially, wherein each of the data clusters comprises at least one of the equivalence classes; and a data generalization module, for generalizing content of the cluster table to generate and output an anonymous dataset corresponding to the original dataset, wherein the original values corresponding to the anchor attribute are maintained originally in the anonymous dataset. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A risk evaluation method, for evaluating an anonymous dataset generated according to an original dataset, and comprising:
-
acquiring a plurality of appearing times respectively corresponding to a plurality of original values of the original dataset; generating a partition set and a weight table according to a sample parameter, an anonymous parameter and the appearing times; dividing the original dataset into a plurality of data partitions according to the partition set, and generating a penetration dataset according to the weight table and the data partitions, wherein the penetration dataset comprises a plurality of sample data; comparing each sample data with a plurality of anonymous data of the anonymous dataset to obtain a plurality of matching quantities respectively corresponding to the sample data; and calculating and outputting a risk evaluation result according to the matching quantities. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A risk evaluation device for evaluating an anonymous dataset generated according to an original dataset, comprising:
-
a memory, for storing data or storing data temporarily; and a processor, coupled to the memory, and comprising; a weight generation module, for acquiring a plurality of appearing times respectively corresponding to a plurality of original values of the original dataset, and for generating a partition set and a weight table according to a sample parameter, an anonymous parameter and the appearing times; a sample generation module, for dividing the original dataset into a plurality of data partitions according to the partition set, and for generating a penetration dataset according to the weight table and the data partitions, wherein the penetration dataset comprises a plurality of sample data; and a risk evaluation module, for comparing each sample data with a plurality of anonymous data of the anonymous dataset in order to obtain a plurality of matching quantities respectively corresponding to the plurality of sample data, and for calculating and outputting a risk evaluation result according to the matching quantities. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48)
-
Specification