System and method for performing a similarity measure of anonymized data

US 20070239705A1
Filed: 03/29/2006
Published: 10/11/2007
Est. Priority Date: 03/29/2006
Status: Active Grant

First Claim

Patent Images

1. A processor-implemented method of performing a similarity measure of a plurality of privacy-protected data, comprising:

selecting a first value and a first context related to the first value;

dividing the first value into a first plurality of substrings in an order preserving way;

processing each substring of the first plurality of substrings through an obfuscation function to produce a first plurality of order preserved obfuscated substrings;

selecting a second value and a second context related to the second value;

dividing the second value into a second plurality of substrings in the order preserving way;

processing each substring of the second plurality of substrings through the obfuscation function to produce a second plurality of order preserved obfuscated substrings;

comparing the first and second plurality of order preserved obfuscated substrings, and determining a value similarity measure for the first and second values based on the comparison of the first and second plurality of order preserved obfuscated substrings; and

comparing the first context and the second context, and determining a context similarity measure for the first context and the second context based on the comparison of the first and second contexts.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A similarity measure system selects a first value and a first context related to the first value, divides the first value into a first set of substrings in an order preserving way, and processes each of these substrings through an obfuscation function to produce a first set of obfuscated substrings. The system selects a second value and a second context related to the second value, and processes the second value to produce a second set of obfuscated substrings. The system calculates a context similarity measure for the first context and the second context. The system determines a value similarity measure from the first and second set of order preserved obfuscated substrings. The system determines a closeness degree between the first value and the second value and a closeness degree based on the context similarity measure.

Citations

30 Claims

1. A processor-implemented method of performing a similarity measure of a plurality of privacy-protected data, comprising:
- selecting a first value and a first context related to the first value;
  
  dividing the first value into a first plurality of substrings in an order preserving way;
  
  processing each substring of the first plurality of substrings through an obfuscation function to produce a first plurality of order preserved obfuscated substrings;
  
  selecting a second value and a second context related to the second value;
  
  dividing the second value into a second plurality of substrings in the order preserving way;
  
  processing each substring of the second plurality of substrings through the obfuscation function to produce a second plurality of order preserved obfuscated substrings;
  
  comparing the first and second plurality of order preserved obfuscated substrings, and determining a value similarity measure for the first and second values based on the comparison of the first and second plurality of order preserved obfuscated substrings; and
  
  comparing the first context and the second context, and determining a context similarity measure for the first context and the second context based on the comparison of the first and second contexts.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 2. The method of claim 1, further comprising adding a sequence identifier to each of the first plurality of substrings to preserve the order of the first plurality of substrings.
  - 3. The method of claim 2, wherein the sequence identifier includes any one or more of:
    - a sequential number, a position specific assigned value, and a value used to designate an ordered substring.
  - 4. The method of claim 2, further comprising adding a sequence identifier to each of the second plurality of substrings to preserve the order of the second plurality of substrings.
  - 5. The method of claim 4, wherein the sequence identifier includes any one or more of:
    - a sequential number, a position specific assigned value, and a value used to designate an ordered substring.
  - 6. The method of claim 1, further comprising determining whether the first and second values are identical,
  - 7. The method of claim 1, further comprising determining whether the first and second contexts are identical.
  - 8. The method of claim 1, further comprising indicating whether the first and second values are similar based on a percentage of similarity.
  - 9. The method of claim 1, further comprising indicating whether the first and second contexts are similar.
  - 10. The method of claim 1, further comprising indicating whether the first and second values are unrelated based on a percentage of similarity.
  - 11. The method of claim 1, further comprising indicating whether the first and second contexts are unrelated.
  - 12. The method of claim 1, further comprising sending the value similarity result to a destination.
  - 13. The method of claim 1, further comprising sending the context similarity result to a destination.
  - 14. The method of claim 12, further comprising encrypting the value similarity result prior to sending the similarity result.
  - 15. The method of claim 12, further comprising encrypting the similarity result prior to sending the similarity result.
  - 16. The method of claim 12, wherein the destination is selected from any one or more of:
    - an application, a queue, an end-user, a communication device, a computer, a handheld appliance, a mobile device, a telephone, a pager, a file system, a database, an audit log and a persistent store.
  - 17. The method of claim 1, wherein the obfuscation function includes a cryptographic function.
  - 18. The method of claim 17, wherein the cryptographic function occurs on a secure hardware device.
  - 19. The method of claim 1, wherein processing each substring through the obfuscation function occurs on a secure hardware device.
  - 20. The method of claim 1, further comprising:
    - performing a pre-processing function to the first value before dividing the first value; and
      
      performing the pre-processing function to the second value before dividing the second value.
  - 21. The method of claim 20, wherein the pre-processing function is selected from any one of:
    - a standard reformatting function, a truncation function, a padding function, a byte alteration function, and a transliteration function.
  - 22. The method of claim 1, further comprising:
    - adding a secondary data to each substring of the first plurality of substrings before processing the first plurality of substrings through the obfuscation function; and
      
      adding the secondary data to each substring of the second plurality of substrings before processing the second plurality of substrings through the obfuscation function.
  - 23. The method of claim 22, wherein the secondary data is selected from any one or more of:
    - a SALT value, an order preserving value, a supplemental value, and a combination of the order preserving value and the supplemental value.
  - 24. The method of claim 1, wherein the first and second value are received, encrypted, and then decrypted for processing.

26. A method of performing a similarity measure of a plurality of privacy preserved data, comprising:
- specifying an input data source;
  
  obfuscating values from the input data source;
  
  specifying similarity parameters;
  
  invoking a similarity measure module, wherein the obfuscated values and the similarity parameters are made available to the similarity measure module for consideration; and
  
  receiving a similarity result from the similarity measure module.

27. A processor-implemented system for performing a similarity measure of a plurality of privacy-protected data, comprising:
- a similarity measure system for selecting a first value and a first context related to the first value;
  
  a substring generator module for dividing the first value into a first plurality of substrings in an order preserving way;
  
  an obfuscation module for processing each substring of the first plurality of substrings through an obfuscation function to produce a first plurality of order preserved obfuscated substrings;
  
  the similarity measure system selecting a second value and a second context related to the second value;
  
  the substring generator module dividing the second value into a second plurality of substrings in the order preserving way;
  
  the obfuscation module processing each substring of the second plurality of substrings through the obfuscation function to produce a second plurality of order preserved obfuscated substrings;
  
  a similarity evaluation module comparing the first and second plurality of order preserved obfuscated substrings, and determining a value similarity measure for the first and second values based on the comparison of the first and second plurality of order preserved obfuscated substrings; and
  
  the similarity evaluation module comparing the first context and the second context, and determining a context similarity measure for the first context and the second context based on the comparison of the first and second contexts.
- View Dependent Claims (28)
- - 28. The system of claim 27, wherein the substring generator module further adds a sequence identifier to each of the first plurality of substrings to preserve the order of the first plurality of substrings.

29. A computer program product having program codes stored on a computer-usable medium for performing a similarity measure of a plurality of privacy-protected data, comprising:
- a program code for selecting a first value and a first context related to the first value;
  
  a program code for dividing the first value into a first plurality of substrings in an order preserving way;
  
  a program code for processing each substring of the first plurality of substrings through an obfuscation function to produce a first plurality of order preserved obfuscated substrings;
  
  a program code for selecting a second value and a second context related to the second value;
  
  a program code for dividing the second value into a second plurality of substrings in the order preserving way;
  
  a program code for processing each substring of the second plurality of substrings through the obfuscation function to produce a second plurality of order preserved obfuscated substrings;
  
  a program code for comparing the first and second plurality of order preserved obfuscated substrings, and determining a value similarity measure for the first and second values based on the comparison of the first and second plurality of order preserved obfuscated substrings; and
  
  a program code for comparing the first context and the second context, and determining a context similarity measure for the first context and the second context based on the comparison of the first and second contexts.
- View Dependent Claims (30)
- - 30. The system of claim 29, further comprising a program code for adding a sequence identifier to each of the first plurality of substrings to preserve the order of the first plurality of substrings.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Jonas, Jeffrey, Hunt, Brand

Granted Patent

US 8,204,213 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/90335   Query processing

G06F 21/6245   Protecting personal data, e...

G06F 21/6254   by anonymising data, e.g. d...

G06F 2221/2105   Dual mode as a secondary as...

System and method for performing a similarity measure of anonymized data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for performing a similarity measure of anonymized data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links