PII IDENTIFICATION LEARNING AND INFERENCE ALGORITHM
First Claim
1. One or more computer-readable media embodied with computer-executable instructions that, when executed by a processor, perform a computer-implemented method for creating and storing multiple tables detailing PII information about at least one data set, comprising:
- accessing the at least one data set, the at least one data set comprising data items;
extracting data about each of the data items, wherein the data comprises a name for each of the data items;
storing the data in a first table;
parsing the names into one or more keywords;
storing each of the one or more keywords in a second table;
mapping each of the one or more keywords to each of the data items the one or more keywords was parsed from; and
determining a number of times each of the one or more keywords is associated with a data item specified as a PII.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques are described herein for determining whether data sets of real information in databases indicate PII information. The data sets are stored in a first table and parsed for keywords related to the names of data items in the sets. The keywords are stored in the second table in a many-to-many relationship with related data items in the first table. The number of times the keywords are parsed from the data items is counted, as well as the number of times each keyword is associated with a PII-designated data item. The counted numbers are then used in analyzing new data sets to identify the likelihood that the new data sets contain any PII data items.
-
Citations
20 Claims
-
1. One or more computer-readable media embodied with computer-executable instructions that, when executed by a processor, perform a computer-implemented method for creating and storing multiple tables detailing PII information about at least one data set, comprising:
-
accessing the at least one data set, the at least one data set comprising data items; extracting data about each of the data items, wherein the data comprises a name for each of the data items; storing the data in a first table; parsing the names into one or more keywords; storing each of the one or more keywords in a second table; mapping each of the one or more keywords to each of the data items the one or more keywords was parsed from; and determining a number of times each of the one or more keywords is associated with a data item specified as a PII. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method, comprising:
-
receiving a database table that includes at least one data set with data items, the data items each comprising a name, data type, and sanitization function; determining one or more keywords associated with the data items; determining whether the one or more keywords match any of a plurality of keywords in a first table; calculating a probability that the one or more keywords are actually a PII based on at least data items associated with the plurality of keywords in the first table; and storing the probability. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A database server, comprising:
-
a processor; one or more computer-readable media, embodied machine-executable instructions that, when executed by the processor, support; (1) a learning application capable of; a) analyzing data items stored in a data archive table, b) determining keywords associated with the data items, c) using the keywords to compute PII statistics, and (2) an inference application capable of determining whether a new data set from a new database contains any PII entries. - View Dependent Claims (18, 19, 20)
-
Specification