Data storage systems and methods
First Claim
1. A data storage system comprising:
- a computing system comprising one or more hardware processors programmed to;
determine a set of file portions from a plurality of training files, at least some of the set of file portions comprising content designated as sensitive information, each of the file portions comprising a subset of content of at least one file from the plurality of training files;
generate a prospective encryption rule for addition to a set of available encryption rules based at least in part on an aggregated set of the file portions, the aggregated set of the file portions including file portions that appear in more than one file from the plurality of training files;
determine a number of files from the plurality of training files identified for encryption by performance of the prospective encryption rule; and
when the number of files identified for encryption does not satisfy a threshold number of files, iteratively modify the prospective encryption rule until the threshold number of files of the plurality of training files are identified for encryption by performance of the modified prospective encryption rule, and store the modified prospective encryption rule at a non-volatile repository.
3 Assignments
0 Petitions
Accused Products
Abstract
Data storage systems are disclosed for automatically generating encryption rules based on a set of training files that are known to include sensitive information. The system may use a number of heuristic algorithms to generate one or more encryption rules for determining whether a file includes sensitive information. Further, the system may apply the heuristic algorithms to the content of the files, as determined by using natural language processing algorithms, to generate the encryption rules. Moreover, systems are disclosed that are capable of automatically determining whether to encrypt a file based on the generated encryption rules. The content of the file may be determined using natural language processing algorithms and then the encryption rules may be applied to the content of the file to determine whether to encrypt the file.
-
Citations
20 Claims
-
1. A data storage system comprising:
a computing system comprising one or more hardware processors programmed to; determine a set of file portions from a plurality of training files, at least some of the set of file portions comprising content designated as sensitive information, each of the file portions comprising a subset of content of at least one file from the plurality of training files; generate a prospective encryption rule for addition to a set of available encryption rules based at least in part on an aggregated set of the file portions, the aggregated set of the file portions including file portions that appear in more than one file from the plurality of training files; determine a number of files from the plurality of training files identified for encryption by performance of the prospective encryption rule; and when the number of files identified for encryption does not satisfy a threshold number of files, iteratively modify the prospective encryption rule until the threshold number of files of the plurality of training files are identified for encryption by performance of the modified prospective encryption rule, and store the modified prospective encryption rule at a non-volatile repository. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A method of automatically generating encryption rules, the method comprising:
by a rules generation system comprising one or more hardware processors, determining a set of file portions from a plurality of training files, at least some of the set of file portions comprising content designated as sensitive information, each of the file portions comprising a subset of content of at least one file from the plurality of training files; generating a prospective encryption rule for addition to a set of available encryption rules based at least in part on an aggregated set of the file portions, the aggregated set of the file portions including at least one file portion that appears in more than one file from the plurality of training files; determining that a number of files from the plurality of training files identified for encryption by performance of the prospective encryption rule does not satisfy a threshold number of files; and in response to said determining, iteratively modifying the prospective encryption rule until the threshold number of files of the plurality of training files are identified for encryption by performance of the modified prospective encryption rule, and storing the modified prospective encryption rule at a non-volatile repository. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
Specification