System and method for matching data sets while maintaining privacy of each data set
First Claim
1. A method for determining matching data elements of a first data set and a second data set without having to disclose the first data set and the second data set:
- the method comprising;
generating by a first processing device, a perfect hash function PHA from the first data set;
evaluating, by the first processing device, the perfect hash function, PHA(ai), for each element ai of the first data set;
encrypting, by the first processing device, each element ai of the first data set using a public key to form an encryption, E(ai);
sending, by the first processing device, the public key, the perfect hash function, PHA(ai), and the encryption, E(ai), for each element ai of the first data set to a second processing device;
evaluating, by the second processing device, the perfect hash function, PHA (bj) for each element bj of the second data set;
finding, by the second processing device, all i such that PHA(bj)=PHA(ai);
computing, by the second processing device, Zj=r(E(ai−
bj))+E(p), using the received public key and where r is a large random number and p is a predetermined variable that comprises a fixed portion k;
sending, by the second processing device, to the first processing device;
decrypting, by the first processing device using the private key, Zj; and
determining, by the first processing device, that element of the first data set matches element bj of the second data set if the decryption of Zj includes the fixed portion k of p and that element ai of the first data set does not match element bj of the second data set if the decryption of Zj does not include the fixed portion k of p.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method that allows two parties to find common records in their data sets without having to actually share the data sets with each other or a third party. Two primitives, perfect hash functions and public key cryptography, are combined in a unique way to obtain a secure and efficient private matching solution. Since the data that is exchanged is always encrypted during the match process, neither party reveals its data to the other party. This solution enables two parties to match sensitive data such as PII (Personally Identifiable Information) or PHI (Protected Health Information) without having to disclose the data to other or to any third party. One or both of the parties only learn of matching records without learning any information about the records that do not match.
14 Citations
2 Claims
-
1. A method for determining matching data elements of a first data set and a second data set without having to disclose the first data set and the second data set:
- the method comprising;
generating by a first processing device, a perfect hash function PHA from the first data set; evaluating, by the first processing device, the perfect hash function, PHA(ai), for each element ai of the first data set; encrypting, by the first processing device, each element ai of the first data set using a public key to form an encryption, E(ai); sending, by the first processing device, the public key, the perfect hash function, PHA(ai), and the encryption, E(ai), for each element ai of the first data set to a second processing device; evaluating, by the second processing device, the perfect hash function, PHA (bj) for each element bj of the second data set; finding, by the second processing device, all i such that PHA(bj)=PHA(ai); computing, by the second processing device, Zj=r(E(ai−
bj))+E(p), using the received public key and where r is a large random number and p is a predetermined variable that comprises a fixed portion k;sending, by the second processing device, to the first processing device; decrypting, by the first processing device using the private key, Zj; and determining, by the first processing device, that element of the first data set matches element bj of the second data set if the decryption of Zj includes the fixed portion k of p and that element ai of the first data set does not match element bj of the second data set if the decryption of Zj does not include the fixed portion k of p.
- the method comprising;
-
2. A method for a first party to determine matching data elements of a first data set maintained by the first party and a second data set maintained by a second party without having to disclose the first data set to the second party or receive the second data set from the second party, the method comprising:
-
generating, by a first processing device, a perfect hash function PHA from the first data set; evaluating, by the first processing device, the perfect hash function, PHA(ai), for each element ai of the first data set; encrypting, by the first processing device, each element ai of the first data set using a public key to form an encryption, E(ai); sending, by the first processing device, the public key, the perfect hash function, PHA(ai), and the encryption, E(ai), for each element ai of the first data set one or more second processing devices to evaluate the perfect hash function, PHA (bj) for each element bj of the second data set, find all i such that PHA(bj)=PHA(ai) and compute Zj=r(E(ai−
bj))+E(p) using the received public key, where r is a large random number and p is a predetermined variable that comprises a fixed portion k;receiving, from the one or more second processing devices, Zj=r(E(ai−
bj)) E(p);decrypting, by the first processing device, Zj; and determining, by the first processing device, that element ai of the first data set matches element bj of the second data set if the decryption of Zj includes the fixed portion k of p and that element ai of the first data set does not match element bj of the second data set if the decryption of Zj does not include the fixed portion k of p.
-
Specification