Anonymizing user identifiable information
First Claim
Patent Images
1. A method performed by a computing system, comprising:
- identifying, by the computing system, a non-indexed raw data set that is not indexed based on user identifiable information (UII) from computer memory in a data warehouse, wherein the raw data set meets an anonymization criteria and includes one or more instances of the UII;
generating randomly generated information (RGI) to be associated with the UII, wherein the RGI is generated to be independent of the UII;
associating, by the computing system, the UII with the RGI in an anonymization identification map;
generating, by the computing system, an anonymized data set using the anonymization identification map, wherein the anonymized data set is an anonymized version of the raw data set, the generating the anonymized data set includes;
determining that a portion of the non-indexed raw data set has a specified data structure,identifying a key in the non-indexed raw data set that is associated with a primary UII associated with the specified data structure, andreplacing a value of the key with the RGI associated with the primary UII;
receiving, by the computing system, an indication to delete an account associated with a user of a social networking system;
identifying, by the computing system, a user identifier (UID) associated with the user, the UID being a specified UII of the user; and
disassociating, by the computing system in the anonymization identification map, the specified UII from a specified RGI associated with the specified UII to delete the account associated with the user.
2 Assignments
0 Petitions
Accused Products
Abstract
The disclosed techniques provide systems and methods for anonymizing various portions of information, action logs, end-user information, and/or other data sets that are stored in non-indexed storage systems. More specifically, various anonymization procedures are described for redacting UII and/or replacing UII in raw data with randomly generated information (RGI). The anonymization process is performed on a rolling basis as raw data is received. An anonymization mapping table maps (or associates) the replaced UII in the anonymized data to the RGI, and eventually all raw data can be deleted.
25 Citations
17 Claims
-
1. A method performed by a computing system, comprising:
-
identifying, by the computing system, a non-indexed raw data set that is not indexed based on user identifiable information (UII) from computer memory in a data warehouse, wherein the raw data set meets an anonymization criteria and includes one or more instances of the UII; generating randomly generated information (RGI) to be associated with the UII, wherein the RGI is generated to be independent of the UII; associating, by the computing system, the UII with the RGI in an anonymization identification map; generating, by the computing system, an anonymized data set using the anonymization identification map, wherein the anonymized data set is an anonymized version of the raw data set, the generating the anonymized data set includes; determining that a portion of the non-indexed raw data set has a specified data structure, identifying a key in the non-indexed raw data set that is associated with a primary UII associated with the specified data structure, and replacing a value of the key with the RGI associated with the primary UII; receiving, by the computing system, an indication to delete an account associated with a user of a social networking system; identifying, by the computing system, a user identifier (UID) associated with the user, the UID being a specified UII of the user; and disassociating, by the computing system in the anonymization identification map, the specified UII from a specified RGI associated with the specified UII to delete the account associated with the user.
-
-
2. The method of claim 1, further comprising:
storing, by the computing system, the anonymized data set in the data warehouse.
-
3. The method of claim 1, wherein generating the anonymized data set using the anonymization identification map comprises:
replacing, by the computing system, at least one of the one or more instances of UII in the raw data set with an associated RGI.
-
4. The method of claim 1, wherein the UII comprises user identifiers (UIDs) and the RGI comprises randomly generated user identifiers (RIDs).
-
5. The method of claim 4, wherein each UID uniquely identifies a user of a social networking system.
-
6. The method of claim 1, wherein generating the anonymized data set using the anonymization identification map comprises:
-
scanning, by the computing system, the raw data set to identify a complex structure; determining, by the computing system, a primary UII associated with the complex structure; parsing, by the computing system, the complex structure to identify a key associated with the UII; identifying, by the computing system, a value associated with the key; and replacing, by the computing system, the value with RGI associated with the primary UII.
-
-
7. The method of claim 6, further comprising:
-
determining, by the computing system, that the key is another complex structure; parsing, by the computing system, the key to identify an additional key if a max depth threshold is not exceeded; identifying, by the computing system, an additional value associated with the additional key; and replacing, by the computing system, the additional value with RGI associated with the primary UII.
-
-
8. The method of claim 1, wherein generating the anonymized data set using the anonymization identification map comprises:
-
identifying, by the computing system, a type of data in a column of the raw data set based on a metadata tag associated with the column; determining, by the computing system, an action associated with the metadata tag; and performing, by the computing system, the action to anonymize the data in the column.
-
-
9. The method of claim 8, wherein performing the action to anonymize the data in the column comprises replacing the one or more instances of UII in the column with an associated RGI.
-
10. The method of claim 8, wherein performing the action to anonymize the data in the column comprises executing a computer script to sanitize the data.
-
11. The method of claim 1, wherein the non-indexed raw data includes a plurality of tables and the raw data set meets the anonymization criteria if one or more of the plurality of tables meet or exceed a first age as determined from a date of origination in the data warehouse.
-
12. The method of claim 1, further comprising:
removing, by the computing system, the raw data set at a second time subsequent to a first time, wherein the anonymized data set is generated at the first time.
-
13. The method of claim 1, further comprising
maintaining, by the computing system, the anonymization identification map.
-
14. The method of claim 13, wherein maintaining the anonymization identification map comprises:
-
accessing, by the computing system, a new data set upon occurrence of a triggering event; scanning, by the computing system, the new data set for instances of UII including a list of one or more scanned UIDs; accessing, by the computing system, a list of active UIDs, wherein an active UID is associated with a corresponding RID in the anonymization identification map; comparing, by the computing system, the list of scanned UIDs to the list of active UIDs to identify a list of new UIDs, wherein new UIDs are included in the list of scanned UIDs but not the list of active UIDs; generating, by the computing system, an RID for each of the new UIDs; associating, by the computing system, each generated RID with the corresponding new UID; and adding, by the computing system, the new UIDs to the list of active UIDs.
-
-
15. The method of claim 1, wherein generating the anonymized data set is initiated upon occurrence of a triggering event.
-
16. A system, comprising:
-
a processor; a memory storing instructions, which when executed by the processor causes the processor to; access a non-indexed raw data set that is not indexed based on user identifiable information (UII) from a data warehouse and an anonymization identification map, wherein the non-indexed raw data set meets an anonymization criteria and includes one or more instances of UII and the anonymization identification map associates the UII with randomly generated information (RGI), wherein the RGI is generated to be independent of the UII; process the anonymization identification map and generate an anonymized data set using the anonymization identification map, wherein the anonymized data set is an anonymized version of the raw data set, wherein the anonymized data set is generated by; determining that a portion of the non-indexed raw data set has a specified data structure, identifying a key in the non-indexed raw data set that is associated with a primary UII associated with the specified data structure, and replacing a value of the key with the RGI associated with the primary UII; receive an indication to delete an account associated with a user of a social networking system; identify a user identifier (UID) associated with the user, the UID being a specified UII of the user; and disassociate, in the anonymization identification map, the specified UII from a specified RGI associated with the specified UII to delete the account associated with the user.
-
-
17. A non-transitory computer-readable storage medium storing computer-readable instructions, which when executed by a processor, causes the processor to perform a method comprising:
-
identifying a non-indexed raw data set that is not indexed based on user identifiable information (UII) from computer memory in a data warehouse, wherein the raw data set meets an anonymization criteria and includes one or more instances of the UII; generating randomly generated information (RGI) to be associated with the UII, wherein the RGI is generated to be independent of the UII; associating the UII with the RGI in an anonymization identification map; generating an anonymized data set using the anonymization identification map, wherein the anonymized data set is an anonymized version of the raw data set, the generating the anonymized data set includes; determining that a portion of the non-indexed raw data set has a specified data structure, identifying a key in the non-indexed raw data set that is associated with a primary UII associated with the specified data structure, and replacing a value of the key with the RGI associated with the primary UII; receiving an indication to delete an account associated with a user of a social networking system; identifying a user identifier (UID) associated with the user, the UID being a specified UII of the user; and disassociating, in the anonymization identification map, the specified UII from a specified RGI associated with the specified UII to delete the account associated with the user.
-
Specification