K-anonymity and L-diversity data anonymization in an in-memory database

US 10,565,398 B2
Filed: 10/26/2017
Issued: 02/18/2020
Est. Priority Date: 10/26/2017
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving an indication to perform data anonymization based on quasi attributes of a data set, wherein the data set includes both the quasi attributes and one or more sensitive attributes;

recursively performing partitioning of the data set based on one or more of the quasi attributes until both a first anonymization threshold corresponding to the quasi attributes is satisfied, wherein the first anonymization threshold is based on K-anonymity and indicates from how many other records that each record in one of the sub-partitions of the resultant data set is indistinguishable and a second anonymization threshold corresponding to the one or more sensitive attributes is satisfied for each of a plurality of sub-partitions produced as a result of the partitioning, wherein the second anonymization threshold is based on L-diversity and indicates a minimum number of sensitive values that exist in each sub-partition of the resultant data set; and

providing a resultant data set including a plurality of records of the data set corresponding to the plurality of sub-partitions that satisfy both the first anonymization threshold and the second anonymization threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are system, method, and computer program product embodiments for data anonymization in an in-memory database. An embodiment operates by receiving an indication to perform data anonymization based on quasi attributes of a data set. Partitioning is recursively performed on the data set based on one or more of the quasi attributes until both a first anonymization threshold corresponding to the quasi attributes is satisfied and a second anonymization threshold corresponding to the one or more sensitive attributes is satisfied for each of a plurality of sub-partitions produced as a result of the partitioning. A resultant data set including a plurality of records of the data set corresponding to the plurality of sub-partitions that satisfy both the first anonymization threshold and the second anonymization threshold is provided.

24 Citations

16 Claims

1. A method comprising:
- receiving an indication to perform data anonymization based on quasi attributes of a data set, wherein the data set includes both the quasi attributes and one or more sensitive attributes;
  
  recursively performing partitioning of the data set based on one or more of the quasi attributes until both a first anonymization threshold corresponding to the quasi attributes is satisfied, wherein the first anonymization threshold is based on K-anonymity and indicates from how many other records that each record in one of the sub-partitions of the resultant data set is indistinguishable and a second anonymization threshold corresponding to the one or more sensitive attributes is satisfied for each of a plurality of sub-partitions produced as a result of the partitioning, wherein the second anonymization threshold is based on L-diversity and indicates a minimum number of sensitive values that exist in each sub-partition of the resultant data set; and
  
  providing a resultant data set including a plurality of records of the data set corresponding to the plurality of sub-partitions that satisfy both the first anonymization threshold and the second anonymization threshold.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the recursively performing comprises:
    - determining after a first partitioning of the data set and prior to performing a second partitioning of the data set that the second anonymization threshold is not satisfied.
  - 3. The method of claim 1, wherein a first partitioning is performed based on a first set of the quasi attributes and a second partitioning is performed on a second set of the quasi attributes, wherein the first set of quasi attributes is different from the second set of quasi attributes.
  - 4. The method of claim 3, wherein both the first partitioning and the second partitioning are performed at least once prior to the second anonymization threshold being satisfied.
  - 5. The method of claim 3, wherein the first partitioning is performed based on a first set of the quasi attributes and the second partitioning is performed on a second set of the quasi attributes, wherein the first set of quasi attributes is the same as the second set of quasi attributes.
  - 6. The method of claim 1, further comprising:
    - determining which attributes of the data set are the quasi attributes and which of the one or more attributes are sensitive attributes based on a designation received from a user.

7. A system, comprising:
- a memory; and
  
  at least one processor coupled to the memory and configured to;
  
  receive an indication to perform data anonymization based on quasi attributes of a data set, wherein the data set includes both the quasi attributes and one or more sensitive attributes;
  
  recursively partitioning of the data set based on one or more of the quasi attributes until both a first anonymization threshold corresponding to the quasi attributes is satisfied, wherein the first anonymization threshold is based on K-anonymity and indicates from how many other records that each record in one of the sub-partitions of the resultant data set is indistinguishable and a second anonymization threshold corresponding to the one or more sensitive attributes is satisfied for each of a plurality of sub-partitions produced as a result of the partitioning, wherein the second anonymization threshold is based on L-diversity and indicates a minimum number of sensitive values that exist in each sub-partition of the resultant data set; and
  
  provide a resultant data set including a plurality of records of the data set corresponding to the plurality of sub-partitions that satisfy both the first anonymization threshold and the second anonymization threshold.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The system of claim 7, wherein the processor that performs the second partitioning is configured to:
    - determine after a first partitioning of the data set and prior to performing a second partitioning of the data set that the second anonymization threshold is not satisfied.
  - 9. The system of claim 7, wherein a first partitioning is performed based on a first set of the quasi attributes and a second partitioning is performed on a second set of the quasi attributes, wherein the first set of quasi attributes is different from the second set of quasi attributes.
  - 10. The system of claim 9, wherein both the first partitioning and the second partitioning are performed at least once prior to the second anonymization threshold being satisfied.
  - 11. The system of claim 9, wherein the first partitioning is performed based on a first set of the quasi attributes and the second partitioning is performed on a second set of the quasi attributes, wherein the first set of quasi attributes is the same as the second set of quasi attributes.

12. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations comprising:
- receiving an indication to perform data anonymization based on quasi attributes of a data set, wherein the data set includes both the quasi attributes and one or more sensitive attributes;
  
  recursively performing partitioning of the data set based on one or more of the quasi attributes until both a first anonymization threshold corresponding to the quasi attributes is satisfied, wherein the first anonymization threshold is based on K-anonymity and indicates from how many other records that each record in one of the sub-partitions of the resultant data set is indistinguishable and a second anonymization threshold corresponding to the one or more sensitive attributes is satisfied for each of a plurality of sub-partitions produced as a result of the partitioning, wherein the second anonymization threshold is based on L-diversity and indicates a minimum number of sensitive values that exist in each sub-partition of the resultant data set; and
  
  providing a resultant data set including a plurality of records of the data set corresponding to the plurality of sub-partitions that satisfy both the first anonymization threshold and the second anonymization threshold.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The non-transitory computer-readable device of claim 12, that performs the second partitioning is configured to perform operations comprising:
    - determining after a first partitioning of the data set and prior to performing a second partitioning of the data set that the second anonymization threshold is not satisfied.
  - 14. The non-transitory computer-readable device of claim 12, wherein a first partitioning is performed based on a first set of the quasi attributes and a second partitioning is performed on a second set of the quasi attributes, wherein the first set of quasi attributes is different from the second set of quasi attributes.
  - 15. The non-transitory computer-readable device of claim 14, wherein both the first partitioning and the second partitioning are performed at least once prior to the second anonymization threshold being satisfied.
  - 16. The non-transitory computer-readable device of claim 14, wherein the first partitioning is performed based on a first set of the quasi attributes and the second partitioning is performed on a second set of the quasi attributes, wherein the first set of quasi attributes is the same as the second set of quasi attributes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SAP SE
Original Assignee
SAP SE
Inventors
Huang, Xinrong
Primary Examiner(s)
Hirl, Joseph P
Assistant Examiner(s)
Gundry, Stephen T

Application Number

US15/794,744
Publication Number

US 20190130129A1
Time in Patent Office

845 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 21/6254   by anonymising data, e.g. d...

G16H 10/60   for patient-specific data, ...

H04L 2209/42   Anonymization, e.g. involvi...

K-anonymity and L-diversity data anonymization in an in-memory database

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

24 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

K-anonymity and L-diversity data anonymization in an in-memory database

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links