Token-based encryption rule generation process
First Claim
1. A data storage system comprising:
- a computing system comprising one or more hardware processors programmed to;
access a set of data tokens derived from a plurality of files comprising a set of training files, at least some of the set of data tokens comprising content designated as sensitive information, each of the data tokens comprising a portion of content of at least one file from the plurality of files;
generate a prospective encryption rule based at least in part on an aggregated set of data tokens, the aggregated set of data tokens including data tokens that appear in more than one file from the plurality of files;
perform the prospective encryption rule on the plurality of files;
determine a number of files from the plurality of files identified for encryption by performance of the prospective encryption rule;
responsive, at least in part, to the number of files identified for encryption satisfying a threshold number of files, add the prospective encryption rule to a set of available encryption rules; and
responsive, at least in part, to the number of files identified for encryption not satisfying the threshold number of files, iteratively modify the prospective encryption rule until the threshold number of files of the plurality of files are identified for encryption by performance of the modified prospective encryption rule.
3 Assignments
0 Petitions
Accused Products
Abstract
Data storage systems are disclosed for automatically generating encryption rules based on a set of training files that are known to include sensitive information. The system may use a number of heuristic algorithms to generate one or more encryption rules for determining whether a file includes sensitive information. Further, the system may apply the heuristic algorithms to the content of the files, as determined by using natural language processing algorithms, to generate the encryption rules. Moreover, systems are disclosed that are capable of automatically determining whether to encrypt a file based on the generated encryption rules. The content of the file may be determined using natural language processing algorithms and then the encryption rules may be applied to the content of the file to determine whether to encrypt the file.
285 Citations
20 Claims
-
1. A data storage system comprising:
a computing system comprising one or more hardware processors programmed to; access a set of data tokens derived from a plurality of files comprising a set of training files, at least some of the set of data tokens comprising content designated as sensitive information, each of the data tokens comprising a portion of content of at least one file from the plurality of files; generate a prospective encryption rule based at least in part on an aggregated set of data tokens, the aggregated set of data tokens including data tokens that appear in more than one file from the plurality of files; perform the prospective encryption rule on the plurality of files; determine a number of files from the plurality of files identified for encryption by performance of the prospective encryption rule; responsive, at least in part, to the number of files identified for encryption satisfying a threshold number of files, add the prospective encryption rule to a set of available encryption rules; and responsive, at least in part, to the number of files identified for encryption not satisfying the threshold number of files, iteratively modify the prospective encryption rule until the threshold number of files of the plurality of files are identified for encryption by performance of the modified prospective encryption rule. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A method of automatically generating encryption rules using machine learning techniques, the method comprising:
-
accessing, by a rules generation system comprising computer hardware, a set of data tokens derived from a plurality of files comprising a set of training files, at least some of the set of data tokens comprising content designated as sensitive information, each of the data tokens comprising a portion of content of at least one file from the plurality of files; generating a prospective encryption rule based at least in part on the set of data tokens; performing the prospective encryption rule with respect to the plurality of files; determining a percentage of files from the plurality of files identified for encryption using the prospective encryption rule; responsive, at least in part, to the percentage of files identified for encryption not satisfying a threshold, iteratively modifying the prospective encryption rule until performance of the modified prospective encryption rule results in a threshold percentage of files being identified for encryption; and subsequent to said iteratively modifying, responsive, at least in part, to the threshold percentage of files being identified for encryption, adding the prospective encryption rule to a set of encryption rules. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification