Data source privacy screening systems and methods

US 20040199781A1
Filed: 08/30/2002
Published: 10/07/2004
Est. Priority Date: 08/30/2001
Status: Abandoned Application

First Claim

Patent Images

1. A method of record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising:

prioritizing said first fields according to a user preference of a user;

using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and

based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A de-identification method and an apparatus for performing same on electronic datasets are described. The method and system processes input datasets or databases that contain records relating to individual entities to produce a resulting output dataset that contains as much information as possible while minimizing the risk that any individual in the input dataset could be re-identified from that output dataset. Individual entities may include patients in a hospital or served by an insurance carrier, as well as voters, subscribers, customers, companies, or any other organization of discrete records. Criteria for preventing re-identification can be selected based on intended use of the output data and can be adjusted based on the content of reference databases. The method and system can also be associated with data acquisition equipment, such as a biologic data sampling device, to prevent de-identification of patient or other confidential data acquired by the equipment.

91 Citations

View as Search Results

19 Claims

1. A method of record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising:
- prioritizing said first fields according to a user preference of a user;
  
  using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and
  
  based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein said pre-defined k-anonymity value is selected by said user.
  - 3. The method of claim 1, further comprising modifying said first data source prior to said comparing.
  - 4. The method of claim 1, wherein said prioritizing further comprises measuring record uniqueness in said first data source.
  - 5. The method of claim 1, further comprising measuring identification risk using said second data source and modifying said prioritizing accordingly.
  - 6. The method of claim 5, further comprising displaying the change in said risk as said pre-defined k-anonymity value is varied by said user.
  - 7. The method of claim 1, wherein said extracting is performed contemporaneously with said comparing.
  - 8. The method of claim 1, wherein said extracting further comprises copying said first records;
    - changing selected first corresponding values to form a plurality of modified records; and
      
      storing said modified records in said third data source.
  - 9. The method of claim 8, wherein said changing further comprises deleting one or more of said selected first values in one or more of said first fields and in one or more of said first records.
  - 10. The method of claim 8, wherein said changing further comprises encrypting one or more of said selected first values in one or more of said first fields and in one or more of said first records.
  - 11. The method of claim 1, wherein one or more of said prioritizing, comparing, and extracting are carried out over a computer network.
  - 12. The method of claim 1, further comprising delivering all or selected portions of said third data source in electronic form.
  - 13. The method of claim 1, wherein said pre-defined k-anonymity value is determined by measuring a re-identification risk using a reference database and modifying said pre-defined k-anonymity value accordingly.
  - 14. The method of claim 13, further comprising automatically checking said re-identification risk when more data are added to the first data source, and decreasing the pre-defined k-anonymity value, if the re-identification risk decreases after addition of the data.

15. An apparatus for record de-identification, comprising:
- a data capture system, wherein the data is placed in a first data source on capture, and wherein said first data source comprises a plurality of first records having one or more first fields, said first fields having at least one corresponding first value;
  
  a reference data source comprising a plurality of second records having one or more second fields, said second fields having at least one corresponding second value;
  
  comparison means for comparing said first fields and said corresponding first values of each said first records to said second fields and corresponding second values of all said second records;
  
  a control interface to a user, operably coupled to said data capture system, said first data source, and said comparison means whereby;
  
  said user pre-defines a resulting k-anonymity value for an output data source; and
  
  said user prioritizes said first fields according to said user'"'"'s preference for preservation; and
  
  extraction means, operably coupled to said control interface and said output data source, for extracting the highest priority first fields from said first data source to said output data source based on said comparing;
  
  wherein said extracting results in a k-anonymity value for said output data source that approximates said pre-defined k-anonymity value
- View Dependent Claims (16)
- - 16. The apparatus of claim 15, further comprising a biochip device coupled to said data capture system and providing the data captured thereby.

17. An apparatus for record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising:
- means for prioritizing said first fields according to a user preference;
  
  using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, means for comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and
  
  based on said comparing, means for extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.

18. A computer system for use in record de-identification for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, comprising computer instructions for:
- prioritizing said first fields according to a user preference;
  
  using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and
  
  based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.

19. A computer-readable medium storing a computer program executable by a plurality of server computers for use with a first data source having a plurality of first records having one or more first fields, said first fields having at least one corresponding first value, the computer program comprising computer instructions for:
- prioritizing said first fields according to a user preference;
  
  using a second data source, wherein said second data source comprises a plurality of second records having one or more second fields, said second fields having at least one corresponding second value, comparing said first fields and said corresponding first values of each said first record to said second fields and said corresponding second values of all of said second records; and
  
  based on said comparing, extracting said first records and said first corresponding values of the highest priority first fields from said first data source to a third data source, wherein said extracting results in a k-anonymity value for said third data source approximating a pre-defined k-anonymity value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
PrivaSource, Inc.
Original Assignee
PrivaSource, Inc.
Inventors
Erickson, Lars Carl, Pettini, Don, Breitenstein, Agneta

Application Number

US10/232,772
Publication Number

US 20040199781A1
Time in Patent Office

Days
Field of Search
US Class Current

726/26
CPC Class Codes

G06F 16/284   Relational databases

G06F 21/6254   by anonymising data, e.g. d...

G16H 10/60   for patient-specific data, ...

G16Z 99/00   Subject matter not provided...

Data source privacy screening systems and methods

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

91 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Data source privacy screening systems and methods

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

91 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links