System and method for data anonymization using hierarchical data clustering and perturbation
First Claim
1. A system for data anonymization comprising:
- a computer system for electronically receiving an original dataset and allowing a user to specify a relative importance of at least one attribute of the dataset; and
an anonymization program executed by the computer system for producing an anonymized dataset from the original dataset, the anonymization program executing;
a vector space mapping sub-process for converting each record of the original dataset to a normalized vector that can be compared to other vectors;
a hierarchical clustering sub-process for dividing the normalized vectors into disjointed k-sized groups of similar records based on a hierarchical clustering technique;
a perturbation sub-process for generating anonymized clusters from individual clusters generated by the hierarchical clustering sub-process; and
an original domain mapping sub-process to combine and remap anonymized clusters back to an original domain of the original dataset.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for data anonymization using hierarchical data clustering and perturbation is provided. The system includes a computer system and an anonymization program executed by the computer system. The system converts the data of a high-dimensional dataset to a normalized vector space and applies clustering and perturbation techniques to anonymize the data. The conversion results in each record of the dataset being converted into a normalized vector that can be compared to other vectors. The vectors are divided into disjointed, small-sized clusters using hierarchical clustering processes. Multi-level clustering can be performed using suitable algorithms at different clustering levels. The records within each cluster are then perturbed such that the statistical properties of the clusters remain unchanged.
6 Citations
30 Claims
-
1. A system for data anonymization comprising:
-
a computer system for electronically receiving an original dataset and allowing a user to specify a relative importance of at least one attribute of the dataset; and an anonymization program executed by the computer system for producing an anonymized dataset from the original dataset, the anonymization program executing; a vector space mapping sub-process for converting each record of the original dataset to a normalized vector that can be compared to other vectors; a hierarchical clustering sub-process for dividing the normalized vectors into disjointed k-sized groups of similar records based on a hierarchical clustering technique; a perturbation sub-process for generating anonymized clusters from individual clusters generated by the hierarchical clustering sub-process; and an original domain mapping sub-process to combine and remap anonymized clusters back to an original domain of the original dataset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for data anonymization comprising:
-
electronically receiving an original dataset at a computer system; allowing a user to specify a relative importance of at least one attribute of the dataset; and executing by the computer system an anonymization program for producing an anonymized dataset from the original dataset, the anonymization program executing; a vector space mapping sub-process for converting each record of the original dataset to a normalized vector that can be compared to other vectors; a hierarchical clustering sub-process for dividing the normalized vectors into disjointed k-sized groups of similar records based on a hierarchical clustering technique; a perturbation sub-process for generating anonymized clusters from individual clusters generated by the hierarchical clustering sub-process; and an original domain mapping sub-process to combine and remap anonymized clusters back to an original domain of the original dataset. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-readable medium having computer-readable instructions stored thereon which, when executed by a computer system, cause the computer system to perform the steps of:
-
electronically receiving an original dataset at a computer system; allowing a user to specify a relative importance of at least one attribute of the dataset; and executing by the computer system an anonymization program for producing an anonymized dataset from the original dataset, the anonymization program executing; a vector space mapping sub-process for converting each record of the original dataset to a normalized vector that can be compared to other vectors; a hierarchical clustering sub-process for dividing the normalized vectors into disjointed k-sized groups of similar records based on a hierarchical clustering technique; a perturbation sub-process for generating anonymized clusters from individual clusters generated by the hierarchical clustering sub-process; and an original domain mapping sub-process to combine and remap anonymized clusters back to an original domain of the original dataset. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification