System and method for cascading token generation and data de-identification
First Claim
1. A computer-implemented method for de-identifying data by creating tokens through a cascading algorithm, comprising:
- processing at least one record comprising a plurality of data elements to identify a subset of data elements, each data element of the subset comprising identifying information for at least one individual;
generating, with at least one processor, a first hash by hashing at least one first data element with at least a client tag unique to a particular client system;
generating, with at least one processor, a second hash by hashing the first hash with at least one other data element of the subset of data elements;
creating at least one token based at least partially on the second hash or a subsequent hash derived from the second hash, wherein the token identifies the at least one individual; and
linking the at least one token and at least a portion of a remainder of the data elements of the plurality of data elements with at least one other record for the at least one individual based at least partially on the at least one token.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for de-identifying data by creating tokens through a cascading algorithm includes the steps of processing at least one record comprising a plurality of data elements to identify a subset of data elements comprising data identifying at least one individual; generating, with at least one processor, a first hash by hashing at least one first data element with at least one second data element of the subset of data elements; generating, with at least one processor, a second hash by hashing the first hash with at least one third data element of the subset of data elements; creating at least one token based at least partially on the second hash or a subsequent hash derived from the second hash, wherein the token identifies the at least one individual; and associating at least a portion of a remainder of the data elements with the at least one token.
42 Citations
20 Claims
-
1. A computer-implemented method for de-identifying data by creating tokens through a cascading algorithm, comprising:
-
processing at least one record comprising a plurality of data elements to identify a subset of data elements, each data element of the subset comprising identifying information for at least one individual; generating, with at least one processor, a first hash by hashing at least one first data element with at least a client tag unique to a particular client system; generating, with at least one processor, a second hash by hashing the first hash with at least one other data element of the subset of data elements; creating at least one token based at least partially on the second hash or a subsequent hash derived from the second hash, wherein the token identifies the at least one individual; and linking the at least one token and at least a portion of a remainder of the data elements of the plurality of data elements with at least one other record for the at least one individual based at least partially on the at least one token. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for de-identifying data, comprising:
-
(a) a data supplier computer comprising at least one processor and a de-identification engine, the de-identification engine configured to; (i) process a data record comprising a plurality of data elements, wherein a subset of data elements of the plurality of data elements comprises personally identifying information for an individual; (ii) generate a token based at least partially on a series of hashes of individual data elements of the subset of data elements, wherein a plurality of hashes in the series of hashes are generated by hashing a previous hash result in the series of hashes with a next individual data element, wherein the previous hash result is based on hashing at least a previous individual data element, and wherein the token is generated at least partially based on a client tag uniquely identifying at least one of a client system and a data supplier; (iii) encrypt at least the token to generate an encrypted token; (b) a data processing entity computer remote from the data supplier computer, the data processing computer comprising at least one processor configured to; (i) receive the encrypted token and unencrypted data elements from the data supplier computer; (ii) decrypt the encrypted token, resulting in the token; (iii) link the token and unencrypted data elements with at least one other record for the individual based at least partially on the token. - View Dependent Claims (9, 10, 11)
-
-
12. A de-identification system, comprising:
-
(a) a de-identification subsystem comprising at least one computer-readable medium containing program instructions which, when executed by at least one remote processor at a data supplier, causes the at least one remote processor to; (i) create a token from at least one record, the token created by performing at least one hash operation on a client tap uniquely identifying a client system and at least one data element of at least one record, wherein the at least one data element comprises personally-identifying information; (ii) encrypt the token with a randomly-generated encryption key, forming an encrypted token; and (iii) encrypt the encrypted token and the randomly-generated encryption key with a public key, forming encrypted data; (b) a record processing subsystem comprising a server and at least one computer-readable medium containing program instructions which, when executed by at least one processor, causes the at least one processor to; (i) receive the encrypted data; (ii) decrypt the encrypted data with a private key corresponding to the public key, resulting in the randomly-generated encryption key and the encrypted token; and (iii) decrypt the encrypted token with the randomly-generated encryption key. - View Dependent Claims (13, 14)
-
-
15. A de-identification engine for de-identifying at least one record comprising a plurality of data elements, wherein a subset of the plurality of data elements comprise personally-identifying data for an individual, the de-identification engine comprising at least one computer-readable medium containing program instructions that, when executed by at least one processor of at least one computer, cause the at least one computer to:
-
(a) generate an initial hash by hashing at least one key and a first data element of the subset of data elements; (b) generate a second hash by hashing a second data element of the subset of data elements with the initial hash; (c) generate a next hash by hashing a previous hash with a next data element of the subset of data elements; and (d) repeat step (c) for all remaining data elements of the subset of data elements, resulting in a final hash value, wherein at least one of the initial hash, the second hash, the next hash, and the final hash value is generated based at least partially on a client tag uniquely identifying at least one of a client system and a data supplier, and wherein at least a portion of a remainder of the plurality of data elements are linked to at least one other record for the individual based on the final hash value. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification