Techniques for detecting encrypted data
First Claim
1. A method to detect an encryption status of a data file stored in a data source and to selectively encrypt the data file based on the encryption status, the method comprising:
- reading the data file from the data source;
comparing a data file type of the data file to a set of data file types, wherein the data file types are included in the set according to a likelihood that the encryption status is known;
in response to a determination that the encryption status of the data file type of the data file is unknown, calculating a value of a property of the data file read from the data source, including calculating a distribution of frequencies of occurrence of a plurality of values in the data file read from the data source;
comparing the calculated value with a threshold value to determine whether the data file read from the data source is encrypted or unencrypted, including comparing the distribution of frequencies of occurrence of the plurality of values in the data file to an average distribution of frequencies for a known reference distribution to determine whether the distribution of frequencies of occurrence of the plurality of values in the data file differs significantly from the average distribution of frequencies for the known reference distribution, wherein the known reference distribution is associated with a text file type;
in response to determining that the data file read from the data source is unencrypted as a result of the comparing, encrypting the data file read from the data source and storing the encrypted data file in a cache; and
in response to determining that the data file read from the data source is encrypted as a result of the comparing, storing the data file read from the data source in the cache without further encryption.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques are described that generally relate to methods for detecting encryption status of a data file or data stream and selectively encrypting the data file or data stream based on the encryption status of the data file or data stream are generally disclosed. Example methods may include one or more of reading the data file or data stream from a data source, calculating a value of a property of the data file or data stream, comparing the calculated value with a threshold value to determine whether the file is encrypted or unencrypted, and encrypting files that are determined to be unencrypted.
54 Citations
26 Claims
-
1. A method to detect an encryption status of a data file stored in a data source and to selectively encrypt the data file based on the encryption status, the method comprising:
-
reading the data file from the data source; comparing a data file type of the data file to a set of data file types, wherein the data file types are included in the set according to a likelihood that the encryption status is known; in response to a determination that the encryption status of the data file type of the data file is unknown, calculating a value of a property of the data file read from the data source, including calculating a distribution of frequencies of occurrence of a plurality of values in the data file read from the data source; comparing the calculated value with a threshold value to determine whether the data file read from the data source is encrypted or unencrypted, including comparing the distribution of frequencies of occurrence of the plurality of values in the data file to an average distribution of frequencies for a known reference distribution to determine whether the distribution of frequencies of occurrence of the plurality of values in the data file differs significantly from the average distribution of frequencies for the known reference distribution, wherein the known reference distribution is associated with a text file type; in response to determining that the data file read from the data source is unencrypted as a result of the comparing, encrypting the data file read from the data source and storing the encrypted data file in a cache; and in response to determining that the data file read from the data source is encrypted as a result of the comparing, storing the data file read from the data source in the cache without further encryption. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 23)
-
-
11. A computing system arranged to detect an encryption status of a data file and selectively encrypt the data file based on the encryption status, the computing system comprising:
-
a data source including the data file stored therein on a non-transitory computer-readable medium; and a data processor implemented in hardware and configured to; read the data file from the data source into system memory; calculate a value of a property of the read data file, by calculation of a distribution of frequencies of occurrence of a plurality of data values in the data file read from the data source; compare the calculated value with a threshold value to determine whether the read data file is encrypted or unencrypted, by comparison of the distribution of frequencies of occurrence of the plurality of data values in the data file to an average distribution of frequencies for a known reference distribution to determine whether the distribution of frequencies of occurrence of the plurality of data values differs significantly from the average distribution of frequencies for the known reference distribution; in response to a determination that the read data file is unencrypted based on the comparison of the calculated value with the threshold value; determine a type of the data file; compare the type of the data file to a table of file extensions indicating compressed files, wherein the table of file extensions includes a .rar file extension; in response to the type of the data file being a match to a file extension in the table of file extensions, determine that the data file is encrypted; in response to the type of the data file not being a match to file extensions in the table of file extensions, run a compression routine on the data file to generate a compressed data file; compare a size of the data file to a size of the compressed data file; based on the comparison of the size of the data file to the size of the compressed data file, determine whether the data file is compressible; in response to a determination that the data file is compressible, determine that the data file is unencrypted; in response to a determination that the data file is not compressible, determine that the data file is encrypted; in response to a determination that the read data file is unencrypted based on the comparison of the calculated value with the threshold value and the comparison of the size of the data file to the size of the compressed data file, encrypt the read data file and store the encrypted data file in a cache; and in response to a determination that the read data file is encrypted based on the comparison of the calculated value with the threshold value, the comparison of the type of the data file to the table of file extensions, or the comparison of the size of the data file to the size of the compressed data file, store the read data file in the cache without further encryption. - View Dependent Claims (12, 13, 14, 15, 16, 21, 24)
-
-
17. A non-transitory computer accessible medium that includes computer executable instructions stored thereon to detect an encryption status of a data file stored in a data source and to selectively encrypt the data file based on the encryption status, when the computer executable instructions are executed by a processing unit the processing unit is configured to perform a procedure comprising:
-
reading the data file from the data source; calculating a value of a property of the data file read from the data source; comparing the calculated value with a threshold value to determine whether the data file read from the data source is encrypted or unencrypted, including comparing a distribution of frequencies of occurrence of the value of the property in the data file to an average distribution of frequencies for a known reference distribution, wherein the known reference distribution includes a distribution of frequencies of a text file; in response to determining that the data file read from the data source is unencrypted as stored in the data source as a result of the comparing, encrypting the data file read from the data source and storing the encrypted data file in a cache; and in response to determining that the data file read from the data source is encrypted as stored in the data source as a result of the comparing, storing the data file read from the data source in the cache without further encryption. - View Dependent Claims (18, 19, 22, 25, 26)
-
Specification