Data source privacy screening systems and methods
First Claim
1. A method of record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising:
- prioritizing said first fields according to a user preference of a user;
using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and
based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.
4 Assignments
0 Petitions
Accused Products
Abstract
A de-identification method and an apparatus for performing same on electronic datasets are described. The method and system processes input datasets or databases that contain records relating to individual entities to produce a resulting output dataset that contains as much information as possible while minimizing the risk that any individual in the input dataset could be re-identified from that output dataset. Individual entities may include patients in a hospital or served by an insurance carrier, as well as voters, subscribers, customers, companies, or any other organization of discrete records. Criteria for preventing re-identification can be selected based on intended use of the output data and can be adjusted based on the content of reference databases. The method and system can also be associated with data acquisition equipment, such as a biologic data sampling device, to prevent de-identification of patient or other confidential data acquired by the equipment.
91 Citations
19 Claims
-
1. A method of record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising:
-
prioritizing said first fields according to a user preference of a user;
using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and
based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for record de-identification, comprising:
-
a data capture system, wherein the data is placed in a first data source on capture, and wherein said first data source comprises a plurality of first records having one or more first fields, said first fields having at least one corresponding first value;
a reference data source comprising a plurality of second records having one or more second fields, said second fields having at least one corresponding second value;
comparison means for comparing said first fields and said corresponding first values of each said first records to said second fields and corresponding second values of all said second records;
a control interface to a user, operably coupled to said data capture system, said first data source, and said comparison means whereby;
said user pre-defines a resulting k-anonymity value for an output data source; and
said user prioritizes said first fields according to said user'"'"'s preference for preservation; and
extraction means, operably coupled to said control interface and said output data source, for extracting the highest priority first fields from said first data source to said output data source based on said comparing;
wherein said extracting results in a k-anonymity value for said output data source that approximates said pre-defined k-anonymity value - View Dependent Claims (16)
-
-
17. An apparatus for record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising:
-
means for prioritizing said first fields according to a user preference;
using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, means for comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and
based on said comparing, means for extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.
-
-
18. A computer system for use in record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising computer instructions for:
-
prioritizing said first fields according to a user preference;
using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and
based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.
-
-
19. A computer-readable medium storing a computer program executable by a plurality of server computers for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, the computer program comprising computer instructions for:
-
prioritizing said first fields according to a user preference;
using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and
based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.
-
Specification