Anonymized data generation method and apparatus
First Claim
1. An anonymized data generation method, comprising:
- generating, by using a computer and from a plurality of data blocks each of which includes a secret attribute value and a numeric attribute value, a plurality of groups each of which includes plural data blocks that satisfy a predetermined condition, wherein the predetermined condition includes a first condition and a second condition, the first condition being a condition that frequency distribution of secret attribute values included in the plural data blocks matches a predetermined pattern, and the second condition being a condition that points represented by numeric attribute values included in the plural data blocks are included in a first area that has a predetermined size; and
replacing, by using the computer and for each of the plurality of groups, a numeric attribute value included in each of plural data blocks that belong to the group with a numeric attribute value calculated for the group, andwherein the generating comprises;
classifying each of the plurality of data blocks into any of a plurality of second areas that have the predetermined size and do not overlap with each other, based on a numeric attribute value included in the data block; and
upon detecting that a set of data blocks included in a certain second area does not satisfy the predetermined condition, changing the set by deleting a data block from the set and/or adding, to the set, a data block included in another second area adjacent to the certain second area, so that changed set satisfies the predetermined condition.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for generating anonymized data includes: (A) extracting, from plural data blocks, each of which includes a secret attribute value and a numeric attribute value, plural groups of data blocks, wherein each of the plural groups includes data blocks that include a first data block, which has not been grouped, whose frequency distribution of the secret attribute value satisfies a predetermined condition and whose numeric attribute values are within a certain area that has a predetermined size; and (B) replacing the numeric attribute values of the data blocks that belong to each group of the plural groups with a numeric attribute value calculated for the group. And, the certain area is determined without any relation with other certain areas for other groups.
7 Citations
8 Claims
-
1. An anonymized data generation method, comprising:
-
generating, by using a computer and from a plurality of data blocks each of which includes a secret attribute value and a numeric attribute value, a plurality of groups each of which includes plural data blocks that satisfy a predetermined condition, wherein the predetermined condition includes a first condition and a second condition, the first condition being a condition that frequency distribution of secret attribute values included in the plural data blocks matches a predetermined pattern, and the second condition being a condition that points represented by numeric attribute values included in the plural data blocks are included in a first area that has a predetermined size; and replacing, by using the computer and for each of the plurality of groups, a numeric attribute value included in each of plural data blocks that belong to the group with a numeric attribute value calculated for the group, and wherein the generating comprises; classifying each of the plurality of data blocks into any of a plurality of second areas that have the predetermined size and do not overlap with each other, based on a numeric attribute value included in the data block; and upon detecting that a set of data blocks included in a certain second area does not satisfy the predetermined condition, changing the set by deleting a data block from the set and/or adding, to the set, a data block included in another second area adjacent to the certain second area, so that changed set satisfies the predetermined condition. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process comprising:
-
generating, from a plurality of data blocks each of which includes a secret attribute value and a numeric attribute value, plural a plurality of groups each of which includes plural data blocks that satisfy a predetermined condition, wherein the predetermined condition includes a first condition and a second condition, the first condition being a condition that frequency distribution of secret attribute values included in the plural data blocks matches a predetermined pattern, and the second condition being a condition that points represented by numeric attribute values included in the plural data blocks are included in a first area that has a predetermined size; and replacing, for each of the plurality of groups, a numeric attribute value included in each of plural data blocks that belong to the group with a numeric attribute value calculated for the group, and wherein the generating comprises; classifying each of the plurality of data blocks into any of a plurality of second areas that have the predetermined size and do not overlap with each other, based on a numeric attribute value included in the data block; and upon detecting that a set of data blocks included in a certain second area does not satisfy the predetermined condition, changing the set by deleting a data block from the set and/or adding, to the set, a data block included in another second area adjacent to the certain second area, so that changed set satisfies the predetermined condition.
-
-
8. An information processing apparatus, comprising
a memory; - and
a processor coupled to the memory and configured to; generate, from a plurality of data blocks each of which includes a secret attribute value and a numeric attribute value, a plurality of groups each of which includes plural data blocks that satisfy a predetermined condition, wherein the predetermined condition includes a first condition and a second condition, the first condition being a condition that frequency distribution of secret attribute values included in the plural data blocks matches a predetermined pattern, and the second condition being a condition that points represented by numeric attribute values included in the plural data blocks are included in a first area that has a predetermined size; and replace, for each of the plurality of groups, a numeric attribute value included in each of plural data blocks that belong to the group with a numeric attribute value calculated for the group, and wherein the generating comprises; classifying each of the plurality of data blocks into any of a plurality of second areas that have the predetermined size and do not overlap with each other, based on a numeric attribute value included in the data block; and upon detecting that a set of data blocks included in a certain second area does not satisfy the predetermined condition, changing the set by deleting a data block from the set and/or adding, to the set, a data block included in another second area adjacent to the certain second area, so that changed set satisfies the predetermined condition.
- and
Specification