System and method for performing a similarity measure of anonymized data
First Claim
1. A processor-implemented method of performing a similarity measure of a plurality of privacy-protected data, comprising:
- selecting a first value and a first context related to the first value;
dividing the first value into a first plurality of substrings in an order preserving way;
processing each substring of the first plurality of substrings through an obfuscation function to produce a first plurality of order preserved obfuscated substrings;
selecting a second value and a second context related to the second value;
dividing the second value into a second plurality of substrings in the order preserving way;
processing each substring of the second plurality of substrings through the obfuscation function to produce a second plurality of order preserved obfuscated substrings;
comparing the first and second plurality of order preserved obfuscated substrings, and determining a value similarity measure for the first and second values based on the comparison of the first and second plurality of order preserved obfuscated substrings; and
comparing the first context and the second context, and determining a context similarity measure for the first context and the second context based on the comparison of the first and second contexts.
1 Assignment
0 Petitions
Accused Products
Abstract
A similarity measure system selects a first value and a first context related to the first value, divides the first value into a first set of substrings in an order preserving way, and processes each of these substrings through an obfuscation function to produce a first set of obfuscated substrings. The system selects a second value and a second context related to the second value, and processes the second value to produce a second set of obfuscated substrings. The system calculates a context similarity measure for the first context and the second context. The system determines a value similarity measure from the first and second set of order preserved obfuscated substrings. The system determines a closeness degree between the first value and the second value and a closeness degree based on the context similarity measure.
-
Citations
30 Claims
-
1. A processor-implemented method of performing a similarity measure of a plurality of privacy-protected data, comprising:
-
selecting a first value and a first context related to the first value;
dividing the first value into a first plurality of substrings in an order preserving way;
processing each substring of the first plurality of substrings through an obfuscation function to produce a first plurality of order preserved obfuscated substrings;
selecting a second value and a second context related to the second value;
dividing the second value into a second plurality of substrings in the order preserving way;
processing each substring of the second plurality of substrings through the obfuscation function to produce a second plurality of order preserved obfuscated substrings;
comparing the first and second plurality of order preserved obfuscated substrings, and determining a value similarity measure for the first and second values based on the comparison of the first and second plurality of order preserved obfuscated substrings; and
comparing the first context and the second context, and determining a context similarity measure for the first context and the second context based on the comparison of the first and second contexts. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
26. A method of performing a similarity measure of a plurality of privacy preserved data, comprising:
-
specifying an input data source;
obfuscating values from the input data source;
specifying similarity parameters;
invoking a similarity measure module, wherein the obfuscated values and the similarity parameters are made available to the similarity measure module for consideration; and
receiving a similarity result from the similarity measure module.
-
-
27. A processor-implemented system for performing a similarity measure of a plurality of privacy-protected data, comprising:
-
a similarity measure system for selecting a first value and a first context related to the first value;
a substring generator module for dividing the first value into a first plurality of substrings in an order preserving way;
an obfuscation module for processing each substring of the first plurality of substrings through an obfuscation function to produce a first plurality of order preserved obfuscated substrings;
the similarity measure system selecting a second value and a second context related to the second value;
the substring generator module dividing the second value into a second plurality of substrings in the order preserving way;
the obfuscation module processing each substring of the second plurality of substrings through the obfuscation function to produce a second plurality of order preserved obfuscated substrings;
a similarity evaluation module comparing the first and second plurality of order preserved obfuscated substrings, and determining a value similarity measure for the first and second values based on the comparison of the first and second plurality of order preserved obfuscated substrings; and
the similarity evaluation module comparing the first context and the second context, and determining a context similarity measure for the first context and the second context based on the comparison of the first and second contexts. - View Dependent Claims (28)
-
-
29. A computer program product having program codes stored on a computer-usable medium for performing a similarity measure of a plurality of privacy-protected data, comprising:
-
a program code for selecting a first value and a first context related to the first value;
a program code for dividing the first value into a first plurality of substrings in an order preserving way;
a program code for processing each substring of the first plurality of substrings through an obfuscation function to produce a first plurality of order preserved obfuscated substrings;
a program code for selecting a second value and a second context related to the second value;
a program code for dividing the second value into a second plurality of substrings in the order preserving way;
a program code for processing each substring of the second plurality of substrings through the obfuscation function to produce a second plurality of order preserved obfuscated substrings;
a program code for comparing the first and second plurality of order preserved obfuscated substrings, and determining a value similarity measure for the first and second values based on the comparison of the first and second plurality of order preserved obfuscated substrings; and
a program code for comparing the first context and the second context, and determining a context similarity measure for the first context and the second context based on the comparison of the first and second contexts. - View Dependent Claims (30)
-
Specification