Deriving encryption rules based on file content
First Claim
1. A data storage system comprising:
- a content analyzer comprising computer hardware, the content analyzer configured to;
access a set of training files that include content designated as sensitive information; and
use one or more processing algorithms with respect to the set of training files to obtain a set of data tokens for each training file, each of the data tokens from the set of data tokens comprising a portion of a training file from the set of training files, the portion of the training file comprising content included in the training file, at least some of the training files including at least some of the sensitive information;
an encryption rules generator comprising computer hardware, the encryption rules generator configured to;
use one or more algorithms to generate a set of encryption rules based on the set of data tokens obtained for each training file, wherein at least some of the set of encryption rules are configured to identify a file to encrypt based at least in part on a correspondence between portions of the file and at least some of the set of data tokens;
generate a prospective encryption rule based on an aggregated set of data tokens, the aggregated set of data tokens based on the set of data tokens for each training file;
perform the prospective encryption rule using the set of training files;
determine a number of training files from the set of training files identified for encryption based on the prospective encryption rule; and
responsive, at least in part, to the number of training files identified for encryption satisfying a threshold, adding the prospective encryption rule to the set of encryption rules; and
an encryption processor comprising computer hardware, the encryption processor configured to encrypt the file based at least in part on one of the encryption rules from the set of encryption rules.
3 Assignments
0 Petitions
Accused Products
Abstract
Data storage systems are disclosed for automatically generating encryption rules based on a set of training files that are known to include sensitive information. The system may use a number of heuristic algorithms to generate one or more encryption rules for determining whether a file includes sensitive information. Further, the system may apply the heuristic algorithms to the content of the files, as determined by using natural language processing algorithms, to generate the encryption rules. Moreover, systems are disclosed that are capable of automatically determining whether to encrypt a file based on the generated encryption rules. The content of the file may be determined using natural language processing algorithms and then the encryption rules may be applied to the content of the file to determine whether to encrypt the file.
267 Citations
18 Claims
-
1. A data storage system comprising:
-
a content analyzer comprising computer hardware, the content analyzer configured to; access a set of training files that include content designated as sensitive information; and use one or more processing algorithms with respect to the set of training files to obtain a set of data tokens for each training file, each of the data tokens from the set of data tokens comprising a portion of a training file from the set of training files, the portion of the training file comprising content included in the training file, at least some of the training files including at least some of the sensitive information; an encryption rules generator comprising computer hardware, the encryption rules generator configured to; use one or more algorithms to generate a set of encryption rules based on the set of data tokens obtained for each training file, wherein at least some of the set of encryption rules are configured to identify a file to encrypt based at least in part on a correspondence between portions of the file and at least some of the set of data tokens; generate a prospective encryption rule based on an aggregated set of data tokens, the aggregated set of data tokens based on the set of data tokens for each training file; perform the prospective encryption rule using the set of training files; determine a number of training files from the set of training files identified for encryption based on the prospective encryption rule; and responsive, at least in part, to the number of training files identified for encryption satisfying a threshold, adding the prospective encryption rule to the set of encryption rules; and an encryption processor comprising computer hardware, the encryption processor configured to encrypt the file based at least in part on one of the encryption rules from the set of encryption rules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of automatically generating encryption rules using machine learning techniques, the method comprising:
-
accessing, by a rules generation system comprising computer hardware, a set of one or more training files that include content designated as sensitive information; applying, by the rules generation system, one or more processing algorithms to each training file included in the set of training files to obtain a set of data tokens for each training file, wherein each of the set of data tokens for a training file corresponds to a portion of the training file, the portion of the training file comprising content included in the training file, at least some of the training files including at least some of the sensitive information, wherein applying the one or more processing algorithms to the set of data tokens comprises; generating a prospective encryption rule based on the set of data tokens; performing the prospective encryption rule with respect to the set of training files; determining a percentage of training files from the set of training files identified for encryption using the prospective encryption rule; and responsive to the percentage of training files identified for encryption satisfying a threshold, adding the prospective encryption rule to the set of encryption rules; applying, by the rules generation system, one or more algorithms to the set of data tokens for each training file to generate a set of encryption rules for identifying files with sensitive information, wherein at least some of the set of encryption rules are configured to identify a file to encrypt based at least in part on a correspondence between portions of the file and at least some of the set of data tokens; and storing the set of encryption rules in an encryption rules repository accessible for one or more systems for determining whether to encrypt the file. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
Specification