Anonymized data generation method and apparatus

US 9,747,467 B2
Filed: 06/25/2015
Issued: 08/29/2017
Est. Priority Date: 01/16/2013
Status: Active Grant

First Claim

Patent Images

1. An anonymized data generation method, comprising:

generating, by using a computer and from a plurality of data blocks each of which includes a secret attribute value and a numeric attribute value, a plurality of groups each of which includes plural data blocks that satisfy a predetermined condition, wherein the predetermined condition includes a first condition and a second condition, the first condition being a condition that frequency distribution of secret attribute values included in the plural data blocks matches a predetermined pattern, and the second condition being a condition that points represented by numeric attribute values included in the plural data blocks are included in a first area that has a predetermined size; and

replacing, by using the computer and for each of the plurality of groups, a numeric attribute value included in each of plural data blocks that belong to the group with a numeric attribute value calculated for the group, andwherein the generating comprises;

classifying each of the plurality of data blocks into any of a plurality of second areas that have the predetermined size and do not overlap with each other, based on a numeric attribute value included in the data block; and

upon detecting that a set of data blocks included in a certain second area does not satisfy the predetermined condition, changing the set by deleting a data block from the set and/or adding, to the set, a data block included in another second area adjacent to the certain second area, so that changed set satisfies the predetermined condition.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for generating anonymized data includes: (A) extracting, from plural data blocks, each of which includes a secret attribute value and a numeric attribute value, plural groups of data blocks, wherein each of the plural groups includes data blocks that include a first data block, which has not been grouped, whose frequency distribution of the secret attribute value satisfies a predetermined condition and whose numeric attribute values are within a certain area that has a predetermined size; and (B) replacing the numeric attribute values of the data blocks that belong to each group of the plural groups with a numeric attribute value calculated for the group. And, the certain area is determined without any relation with other certain areas for other groups.

7 Citations

View as Search Results

8 Claims

1. An anonymized data generation method, comprising:
- generating, by using a computer and from a plurality of data blocks each of which includes a secret attribute value and a numeric attribute value, a plurality of groups each of which includes plural data blocks that satisfy a predetermined condition, wherein the predetermined condition includes a first condition and a second condition, the first condition being a condition that frequency distribution of secret attribute values included in the plural data blocks matches a predetermined pattern, and the second condition being a condition that points represented by numeric attribute values included in the plural data blocks are included in a first area that has a predetermined size; and
  
  replacing, by using the computer and for each of the plurality of groups, a numeric attribute value included in each of plural data blocks that belong to the group with a numeric attribute value calculated for the group, andwherein the generating comprises;
  
  classifying each of the plurality of data blocks into any of a plurality of second areas that have the predetermined size and do not overlap with each other, based on a numeric attribute value included in the data block; and
  
  upon detecting that a set of data blocks included in a certain second area does not satisfy the predetermined condition, changing the set by deleting a data block from the set and/or adding, to the set, a data block included in another second area adjacent to the certain second area, so that changed set satisfies the predetermined condition.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The anonymized data generation method as set forth in claim 1, further comprising deleting, by using the computer and for each of the plurality of groups, secret attribute values included in plural data blocks that belong to group.
  - 3. The anonymized data generation method as set forth in claim 1, wherein the generating comprises extracting, from a group of the plurality of groups, a first data block that is other than data blocks, which are mandatory for a state where the first condition is satisfied.
  - 4. The anonymized data generation method as set forth in claim 1, wherein the first condition includes a lower limit value for a number of kinds of secret attribute values, and the changing comprises:
    - extracting, on a basis of a data block included in the set and from the another second area, a second data block that is to be added to the set so that a number of kinds of secret attribute values included in the data blocks included in the set and the second data block is equal to or greater than the lower limit value; and
      
      determining the first area that has the predetermined size based on a numeric attribute value included in the extracted second data block.
  - 5. The anonymized data generation method as set forth in claim 1, further comprising upon detecting that a third data block that does not belong to any of the plurality of groups, classifying, by using the computer, the third data block to a group of the plurality of groups, when a distance between a point represented by a numerical attribute value included in the third data block and a reference position of a certain area that includes points represented by numerical attribute values included in plural data blocks that belong to the group is equal to or less than a distance that corresponds to the predetermined size, and the first condition is still satisfied even when the third data block is added to the group.
  - 6. The anonymized data generation method as set forth in claim 1, wherein the replacing comprises:
    - randomly generating, for each of the plurality of groups, an area that has the predetermined size and includes points represented by numeric attribute values included in plural data blocks which belong to the group; and
      
      replacing, for each of the plurality of groups, a numeric attribute value included in each of plural data blocks which belong to the group with a numeric attribute value that corresponds to a position within an area generated for the group.

7. A non-transitory computer-readable storage medium storing a program that causes a computer to execute a process, the process comprising:
- generating, from a plurality of data blocks each of which includes a secret attribute value and a numeric attribute value, plural a plurality of groups each of which includes plural data blocks that satisfy a predetermined condition, wherein the predetermined condition includes a first condition and a second condition, the first condition being a condition that frequency distribution of secret attribute values included in the plural data blocks matches a predetermined pattern, and the second condition being a condition that points represented by numeric attribute values included in the plural data blocks are included in a first area that has a predetermined size; and
  
  replacing, for each of the plurality of groups, a numeric attribute value included in each of plural data blocks that belong to the group with a numeric attribute value calculated for the group, andwherein the generating comprises;
  
  classifying each of the plurality of data blocks into any of a plurality of second areas that have the predetermined size and do not overlap with each other, based on a numeric attribute value included in the data block; and
  
  upon detecting that a set of data blocks included in a certain second area does not satisfy the predetermined condition, changing the set by deleting a data block from the set and/or adding, to the set, a data block included in another second area adjacent to the certain second area, so that changed set satisfies the predetermined condition.

8. An information processing apparatus, comprisinga memory;
- anda processor coupled to the memory and configured to;
  
  generate, from a plurality of data blocks each of which includes a secret attribute value and a numeric attribute value, a plurality of groups each of which includes plural data blocks that satisfy a predetermined condition, wherein the predetermined condition includes a first condition and a second condition, the first condition being a condition that frequency distribution of secret attribute values included in the plural data blocks matches a predetermined pattern, and the second condition being a condition that points represented by numeric attribute values included in the plural data blocks are included in a first area that has a predetermined size; and
  
  replace, for each of the plurality of groups, a numeric attribute value included in each of plural data blocks that belong to the group with a numeric attribute value calculated for the group, andwherein the generating comprises;
  
  classifying each of the plurality of data blocks into any of a plurality of second areas that have the predetermined size and do not overlap with each other, based on a numeric attribute value included in the data block; and
  
  upon detecting that a set of data blocks included in a certain second area does not satisfy the predetermined condition, changing the set by deleting a data block from the set and/or adding, to the set, a data block included in another second area adjacent to the certain second area, so that changed set satisfies the predetermined condition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Yamaoka, Yuji
Primary Examiner(s)
Hoffman, Brandon
Assistant Examiner(s)
Nipa, Wasika

Application Number

US14/749,761
Publication Number

US 20150294121A1
Time in Patent Office

796 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/254   Extract, transform and load...

G06F 16/285   Clustering or classification

G06F 21/6254   by anonymising data, e.g. d...

H04L 63/04   for providing a confidentia...

H04L 63/0421   Anonymous communication, i....

Anonymized data generation method and apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

7 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Anonymized data generation method and apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

7 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links