Wavelet decomposition of software entropy to identify malware
First Claim
Patent Images
1. A method comprising:
- analyzing, by at least one data processor, a data file to obtain characters contained in the data file, the characters split into a plurality of data file chunks;
representing, by the at least one data processor, the data file as a plurality of entropy values reflective of an amount of entropy across the plurality of file chunks;
applying, by the at least one data processor, a wavelet transform to the plurality of entropy values to generate a wavelet energy spectrum that represents an amount of entropic energy at multiple levels of resolution;
determining, by the at least one data processor and using at least one predictive model, which levels of resolution of the multiple levels of resolution exert the strongest influences on a probability of the data file being malicious and whether the entropic energy at such levels of resolution make a likelihood of the data file being malicious larger or smaller;
calculating, by the at least one data processor, a suspiciously structured entropy score based on the wavelet energy spectrum and the determination, wherein the suspiciously structured entropy score represents a probability of whether or not the data file is likely to be malicious; and
incorporating, by the at least one data processor, the suspiciously structured entropy score within an existing malware model.
0 Assignments
0 Petitions
Accused Products
Abstract
A plurality of data files is received. Thereafter, each file is represented as an entropy time series that reflects an amount of entropy across locations in code for such file. A wavelet transform is applied, for each file, to the corresponding entropy time series to generate an energy spectrum characterizing, for the file, an amount of entropic energy at multiple scales of code resolution. It can then be determined, for each file, whether or not the file is likely to be malicious based on the energy spectrum. Related apparatus, systems, techniques and articles are also described.
72 Citations
30 Claims
-
1. A method comprising:
-
analyzing, by at least one data processor, a data file to obtain characters contained in the data file, the characters split into a plurality of data file chunks; representing, by the at least one data processor, the data file as a plurality of entropy values reflective of an amount of entropy across the plurality of file chunks; applying, by the at least one data processor, a wavelet transform to the plurality of entropy values to generate a wavelet energy spectrum that represents an amount of entropic energy at multiple levels of resolution; determining, by the at least one data processor and using at least one predictive model, which levels of resolution of the multiple levels of resolution exert the strongest influences on a probability of the data file being malicious and whether the entropic energy at such levels of resolution make a likelihood of the data file being malicious larger or smaller; calculating, by the at least one data processor, a suspiciously structured entropy score based on the wavelet energy spectrum and the determination, wherein the suspiciously structured entropy score represents a probability of whether or not the data file is likely to be malicious; and incorporating, by the at least one data processor, the suspiciously structured entropy score within an existing malware model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
at least one data processor; and at least one memory storing instructions which, when executed by the at least one data processor, causes operations comprising; analyzing, by at least one data processor, a data file to obtain characters contained in the data file, the characters split into a plurality of file chunks; representing, by the at least one data processor, the data file as a plurality of entropy values reflective of an amount of entropy across the plurality of file chunks; applying, by the at least one data processor, a wavelet transform to the plurality of entropy values to generate a wavelet energy spectrum that represents an amount of entropic energy at multiple levels of resolution; determining, by the at least one data processor and using at least one predictive model, which levels of resolution of the multiple levels of resolution exert the strongest influences on a probability of the data file being malicious and whether the entropic energy at such levels of resolution make a likelihood of the data file being malicious larger or smaller; calculating, by the at least one data processor, a suspiciously structured entropy score based on the wavelet energy spectrum and the determination, wherein the suspiciously structured entropy score represents a probability of whether or not the data file is likely to be malicious; and incorporating, by the at least one data processor, the suspiciously structured entropy score within an existing malware model. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory computer program product storing instructions which, when executed by at least one data processor forming part of at least one computing system, result in operations comprising:
-
analyzing, by at least one data processor, a data file to obtain characters contained in the data file, the characters split into a plurality of file chunks; representing, by the at least one data processor, the data file as a plurality of entropy values reflective of an amount of entropy across the plurality of file chunks; applying, by the at least one data processor, a wavelet transform to the plurality of entropy values to generate a wavelet energy spectrum that represents an amount of entropic energy at multiple levels of resolution; determining, by the at least one data processor and using at least one predictive model, which levels of resolution of the multiple levels of resolution exert the strongest influences on a probability of the data file being malicious and whether the entropic energy at such levels of resolution make a likelihood of the data file being malicious larger or smaller; calculating a suspiciously structured entropy score based on the wavelet energy spectrum and the determination, wherein the suspiciously structured entropy score represents a probability of whether or not the data file is likely to be malicious; and incorporating, by the at least one data processor, the suspiciously structured entropy score within an existing malware model. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification