Layered masking of content
First Claim
Patent Images
1. A method comprising:
- receiving content including a token;
storing one or more regular expressions for the content, wherein each regular expression comprises a sequence of symbols and characters expressing a string or pattern;
determining, by a computer system based on matching the one or more regular expressions with the token, a first confidence score indicating a probability that the token includes personally identifiable information (PII), the first confidence score being associated with the regular expression;
storing a lookup table that includes one or more tokens for known PII;
determining, by the computer system based on matching the token with tokens in the lookup table, a second confidence score indicating a probability that the token includes PII, the second confidence score being associated with a term in the lookup table that is an exact match of the token;
storing a model for determining a third confidence score indicating a probability that the token includes PII, wherein the model is generated using a machine learning training algorithm;
determining, by the computer system based on inputting the token into the model, the third confidence score;
masking the token by the computer system based on the first confidence score, the second confidence score and the third confidence score; and
providing, by the computer system as data of improved privacy, the content including the masked token to a content consuming device.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems and computer program products for layered masking of data are described. A system receives content including personally identifiable information (PII). The system redacts the content by masking the PII. The system identifies the PII in multi-layer processing, where in each layer, the system determines a respective confidence score indicating a probability that a token is PII. If the confidence score is sufficiently high, the system masks the token. Otherwise, the system provides the token to a next layer for processing. The layers can include regular expression based processing, lookup table based processing, and machine learning based processing.
17 Citations
20 Claims
-
1. A method comprising:
-
receiving content including a token; storing one or more regular expressions for the content, wherein each regular expression comprises a sequence of symbols and characters expressing a string or pattern; determining, by a computer system based on matching the one or more regular expressions with the token, a first confidence score indicating a probability that the token includes personally identifiable information (PII), the first confidence score being associated with the regular expression; storing a lookup table that includes one or more tokens for known PII; determining, by the computer system based on matching the token with tokens in the lookup table, a second confidence score indicating a probability that the token includes PII, the second confidence score being associated with a term in the lookup table that is an exact match of the token; storing a model for determining a third confidence score indicating a probability that the token includes PII, wherein the model is generated using a machine learning training algorithm; determining, by the computer system based on inputting the token into the model, the third confidence score; masking the token by the computer system based on the first confidence score, the second confidence score and the third confidence score; and providing, by the computer system as data of improved privacy, the content including the masked token to a content consuming device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
one or more processors; and a non-transitory computer-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; receiving content including a token; storing one or more regular expressions for the content, wherein each regular expression comprises a sequence of symbols and characters expressing a string or pattern; determining, based on matching the one or more regular expressions with the token, a first confidence score indicating a probability that the token includes personally identifiable information (PII), the first confidence score being associated with the regular expression; storing a lookup table that includes one or more tokens for known PII; determining, based on matching the token with tokens in the lookup table, a second confidence score indicating a probability that the token includes PII, the second confidence score being associated with a term in the lookup table that is an exact match of the token; storing a model for determining a third confidence score indicating a probability that the token includes PII, wherein the model is generated using a machine learning training algorithm; determining, based on inputting the token into the model, the third confidence score; masking the token based on the first confidence score, the second confidence score and the third confidence score; and providing, as data of improved privacy, the content including the masked token to a content consuming device. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
receiving content including a token; storing one or more regular expressions for the content, wherein each regular expression comprises a sequence of symbols and characters expressing a string or pattern; determining, based on matching the one or more regular expressions with the token, a first confidence score indicating a probability that the token includes personally identifiable information (PII), the first confidence score being associated with the regular expression; storing a lookup table that includes one or more tokens for known PII; determining, based on matching the token with tokens in the lookup table, a second confidence score indicating a probability that the token includes PII, the second confidence score being associated with a term in the lookup table that is an exact match of the token; storing a model for determining a third confidence score indicating a probability that the token includes PII, wherein the model is generated using a machine learning training algorithm; determining, based on inputting the token into the model, a third confidence score; masking the token based on the first confidence score, the second confidence score and the third confidence score; and providing, as data of improved privacy, the content including the masked token to a content consuming device. - View Dependent Claims (18, 19, 20)
-
Specification