Systems and methods for deidentifying entries in a data source
First Claim
1. A system for deidentifying entries in an input data source having field-structured data organized in fields and entries, comprising:
- a processor; and
a deidentification module comprising software code which when executed by the processor causes the procecessor to anonymize entries in a version of the input data source by generalizing at least one entry value of the version of the input data source to yield an output data source having field-structured data organized in fields and entries, wherein the generalization is such that a value of each entry within at least one field of the output data source occurs at least k times, and wherein a value of k is such that entries of the output data source match a specified anonymity requirement, and wherein the processor, when executing the software code of the deidentification module, anonymizes entries in the version of the input data source by at least one of suppressing or replacing entry values in the version of the input data source such that the entries of the output data source match the specified anonymity requirement.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for deidentifying, or anonymizing, entries in an input data source are presented. According to one embodiment, the system includes a deidentification module for modifying entries in a version of the input data source to yield an output data source such that the entries of the output data source match a specified anonymity requirement. According to one embodiment, the resulting output data source may match the specified anonymity requirement with respect to a recipient profile that is input to the system. The deidentification module may further modify the entries in the version of the input data source such that the entries in the output data source are minimally distorted given the specified anonymity requirement.
-
Citations
30 Claims
-
1. A system for deidentifying entries in an input data source having field-structured data organized in fields and entries, comprising:
-
a processor; and a deidentification module comprising software code which when executed by the processor causes the procecessor to anonymize entries in a version of the input data source by generalizing at least one entry value of the version of the input data source to yield an output data source having field-structured data organized in fields and entries, wherein the generalization is such that a value of each entry within at least one field of the output data source occurs at least k times, and wherein a value of k is such that entries of the output data source match a specified anonymity requirement, and wherein the processor, when executing the software code of the deidentification module, anonymizes entries in the version of the input data source by at least one of suppressing or replacing entry values in the version of the input data source such that the entries of the output data source match the specified anonymity requirement. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for deidentifying entries in an input data source having field-structured data organized in fields and entries, comprising:
-
a processor; and a deidentification module comprising software code which when executed by the processor causes the the processor to anonymize entries in a version of the input data source by generalizing at least one entry value of the version of the input data source to yield an output data source having field-structured data organized in fields and entries, wherein the generalization is such that a value of each entry within at least one field of the output data source occurs at least k times, and wherein a value of k is such that entries of the output data source match a specified anonymity requirement, wherein the processor, when executing the software code of the deidentification module, anonymizes entries in the version of the input data source by; determining whether each field in the version of the input data source requires one of an equivalent class substitution or a generalization; and replacing an entry value in a field of each entry with a replacement value determined according to a generalization hierarchy when a determination is made that the field requires a generalization. - View Dependent Claims (7, 8, 9)
-
-
10. A computer readable medium, having stored thereon instructions, which when executed by a processor, cause the processor to:
-
read a specified anonymity requirement; anonymize entries in a version of an input data source, the input data source having field-structured data organized in fields and entries, by generalizing at least one entry value of the version of the input data source to yield an output data source having field-structured data organized in fields and entries, wherein the generalization is such that a value of each entry within at least one field of the output data source occurs at Least k times, and wherein a value of k is such that entries of the output data source match the specified anonymity requirement; determine whether each field in the version of the input data source requires one of an equivalent class substitution or a generalization; and replace an entry value in a field of each entry with a replacement value determined according to a generalization hierarchy when a determination is made that the field requires a generalization. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A method for deidentifying entries in an input data source having field-structured data organized in fields arid entries, comprising:
-
receiving a specified anonymity requirement; and anonymizing entries in a version of the input data source by generalizing at least one entry value of the version of the input data source to yield an output data source, the output data source having field-structured data organized in fields and entries, wherein the generalization is such that a value of each entry within at least one field of the output data source occurs at least k times, and wherein a value of k is such that entries of the output data source match the specified anonymity requirement, wherein anonymizing entries in the input data source includes; determining whether each field in the version of the input data source requires one of an equivalent class substitution or a generalization; and replacing an entry value in a field of each entry with a replacement value determined according to a generalization hierarchy when a determination is made that the field requires a generalization. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. A system for deidentifying entries in an input data source, having field-structured data organized in fields and entries, comprising:
-
means for anonymizing entries in a version of the input data source by generalizing at least one entry value of the version of the input data source to yield an output data source having field-structured data organized in fields and entries, wherein the generalization is such that a value of each entry within at least one field of the output data source occurs at least k times, and wherein a value of k is such that entries of the output data source match a specified anonymity requirement, wherein the means for anonymizing entries in the input data source is further for; determining whether each field in the version of the input data source requires one of an equivalent class substitution or a generalization; and replacing an entry value in a field of each entry with a replacement value determined according to a generalization hierarchy when a determination is made that the field requires a generalization. - View Dependent Claims (25, 26, 27, 28, 29, 30)
-
Specification