Layered Masking of Content
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems and computer program products for layered masking of data are described. A system receives content including personally identifiable information (PII). The system redacts the content by masking the PII. The system identifies the PII in multi-layer processing, where in each layer, the system determines a respective confidence score indicating a probability that a token is PII. If the confidence score is sufficiently high, the system masks the token. Otherwise, the system provides the token to a next layer for processing. The layers can include regular expression based processing, lookup table based processing, and machine learning based processing.
-
Citations
40 Claims
-
1-20. -20. (canceled)
-
21. A method comprising:
-
storing a plurality of regular expressions for identifying personally identifiable information (PII), wherein each regular expression of the plurality of regular expressions comprises a sequence of symbols and characters expressing a string or pattern; storing for each regular expression of the plurality of regular expressions a corresponding probability that the regular expression identifies the PII; receiving content that includes a token; comparing data within the token with each of the plurality of regular expressions; determining, based on comparing each of the plurality of regular expressions with the data within the token, whether the data within the token matches a regular expression of the plurality of regular expressions; and based on determining that the data within the token matches the regular expression of the plurality of regular expressions, retrieving a probability associated the regular expression that matches the data within the token; calculating a first confidence score according to the probability associated the regular expression that matches the data within the token; and masking the token in accordance with the confidence score. - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
-
28. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
storing a plurality of regular expressions for identifying personally identifiable information (PII), wherein each regular expression of the plurality of regular expressions comprises a sequence of symbols and characters expressing a string or pattern; storing for each regular expression of the plurality of regular expressions a corresponding probability that the regular expression identifies the PII; receiving content that includes a token; comparing data within the token with each of the plurality of regular expressions; determining, based on comparing each of the plurality of regular expressions with the data within the token, whether the data within the token matches a regular expression of the plurality of regular expressions; and based on determining that the data within the token matches the regular expression of the plurality of regular expressions, retrieving a probability associated the regular expression that matches the data within the token; calculating a first confidence score according to the probability associated the regular expression that matches the data within the token; and masking the token in accordance with the confidence score. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
-
35. A system comprising:
-
one or more processors; and a non-transitory computer-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; storing a plurality of regular expressions for identifying personally identifiable information (PII), wherein each regular expression of the plurality of regular expressions comprises a sequence of symbols and characters expressing a string or pattern; storing for each regular expression of the plurality of regular expressions a corresponding probability that the regular expression identifies the PII; receiving content that includes a token; comparing data within the token with each of the plurality of regular expressions; determining, based on comparing each of the plurality of regular expressions with the data within the token, whether the data within the token matches a regular expression of the plurality of regular expressions; and based on determining that the data within the token matches the regular expression of the plurality of regular expressions, retrieving a probability associated the regular expression that matches the data within the token; calculating a first confidence score according to the probability associated the regular expression that matches the data within the token; and masking the token in accordance with the confidence score. - View Dependent Claims (36, 37, 38, 39, 40)
-
Specification